Systems and methods for enhancement of retinal images

ABSTRACT

Embodiments disclose systems and methods that aid in screening, diagnosis and/or monitoring of medical conditions. The systems and methods may allow, for example, for automated identification and localization of lesions and other anatomical structures from medical data obtained from medical imaging devices, computation of image-based biomarkers including quantification of dynamics of lesions, and/or integration with telemedicine services, programs, or software.

PRIORITY INFORMATION

This application is a continuation of U.S. patent application Ser. No.16/039,268, filed Jul. 18, 2018, which is a which is a continuation ofU.S. patent application Ser. No. 15/242,303, filed Aug. 19, 2016, nowabandoned, which is a continuation of U.S. patent application Ser. No.14/507,777, filed Oct. 6, 2014, now abandoned, which is a continuation,under 37 CFR 1.53(b), of U.S. patent application Ser. No. 14/266,688,filed Apr. 30, 2014, now U.S. Pat. No. 8,885,901, which in-turn claimsthe benefit of and priority to (under 35 U.S.C. § 119(e)) U.S.Provisional Patent Application No. 61/893,885, filed Oct. 22, 2013, thedisclosures of all of which are hereby incorporated by reference hereinin their entireties and should be considered a part of thisspecification. The parent application Ser. No. 14/266,688 was filed onthe same day as the following applications, U.S. patent application Ser.No. 14/266,749, now U.S. Pat. No. 8,879,813, U.S. patent applicationSer. No. 14/266,746, now U.S. Pat. No. 9,002,085, and U.S. patentapplication Ser. No. 14/266,753, now U.S. Pat. No. 9,008,391, and isalso related to U.S. patent application Ser. No. 14/500,929, filed Sep.29, 2014, now abandoned, and U.S. patent application Ser. No.15/238,674, filed Aug. 16, 2016, all of which are hereby incorporated byreference in their entireties herein.

STATEMENT REGARDING FEDERALLY SPONSORED R&D

The inventions disclosed herein were made with government support underGrants EB013585 and TR000377 awarded by the National Institutes ofHealth. The government has certain rights in the invention.

BACKGROUND OF THE DISCLOSURE

Imaging of human organs plays a critical role in diagnosis of multiplediseases. This is especially true for the human retina, where thepresence of a large network of blood vessels and nerves make it anear-ideal window for exploring the effects of diseases that harm vision(such as diabetic retinopathy seen in diabetic patients, cytomegalovirusretinitis seen in HIV/AIDS patients, glaucoma, and so forth) or othersystemic diseases (such as hypertension, stroke, and so forth). Advancesin computer-aided image processing and analysis technologies areessential to make imaging-based disease diagnosis scalable,cost-effective, and reproducible. Such advances would directly result ineffective triage of patients, leading to timely treatment and betterquality of life.

SUMMARY OF THE DISCLOSURE

In one embodiment a computing system for enhancing a retinal image isdisclosed. The computing system may include one or more hardwarecomputer processors; and one or more storage devices configured to storesoftware instructions configured for execution by the one or morehardware computer processors in order to cause the computing system to:access a medical retinal image for enhancement, the medical retinalimage related to a subject; compute a median filtered image

with a median computed over a geometric shape, at single or multiplescales; determine whether intensity at a first pixel location in themedical retinal image I(x,y) is lower than intensity at a same positionin the median filtered image

(x,y) for generating an enhanced image; if the intensity at the firstpixel location is lower, then set a value at the first pixel location inthe enhanced image to a value around a middle of a minimum and a maximumintensity value for the medical retinal image C_(mid) scaled by a ratioof intensity at medical retinal image to intensity in the medianfiltered image as expressed by

${C_{mid} \cdot \frac{I( {x,y} )}{I_{{Back},}( {x,y} )}};$

and if the intensity at the first pixel location is not lower, then seta value at the first pixel location in the enhanced image to a sum ofaround the middle of the minimum and the maximum intensity value for themedical retinal image, C_(mid), and (C_(mid)−1) scaled by a ratio of adifference of intensity of the median filtered image from intensity ofthe medical retinal original image to a difference of intensity of themedian filtered image from a maximum possible intensity value C_(max),expressed as

${C_{mid} + {( {C_{mid} - 1} ) \cdot \frac{{I( {x,y} )} - {I_{{Back},}( {x,y} )}}{C_{\max} - {I_{{Back},}( {x,y} )}}}};$

wherein the enhanced image is used to infer or further analyze, amedical condition of the subject.

In an additional embodiment, a computer-implemented method for enhancinga retinal image is disclosed. The method may include, as implemented byone or more computing devices configured with specific executableinstructions, accessing a medical retinal image for enhancement, themedical retinal image related to a subject; computing a median filteredimage

with a median computed over a geometric shape, at single or multiplescales; determining whether intensity at a first pixel location in themedical retinal image I(x,y) is lower than intensity at a same positionin the median filtered image

(x,y) for generating an enhanced image; if the intensity at the firstpixel location is lower, then setting a value at the first pixellocation in the enhanced image to a value around a middle of a minimumand a maximum intensity value for the medical retinal image C_(mid)scaled by a ratio of intensity at medical retinal image to intensity inthe median filtered image as expressed by

${C_{mid} \cdot \frac{I( {x,y} )}{I_{{Back},}( {x,y} )}};$

and if the intensity at the first pixel location is not lower, thensetting a value at the first pixel location in the enhanced image to asum of around the middle of the minimum and the maximum intensity valuefor the medical retinal image, C_(mid), and (C_(mid)−1) scaled by aratio of a difference of intensity of the median filtered image fromintensity of the medical retinal original image to a difference ofintensity of the median filtered image from a maximum possible intensityvalue C_(max), expressed as

${C_{mid} + {( {C_{mid} - 1} ) \cdot \frac{{I( {x,y} )} - {I_{{Back},}( {x,y} )}}{C_{\max} - {I_{{Back},}( {x,y} )}}}};$

using the enhanced image to infer or further analyze, a medicalcondition of the subject.

In a further embodiment, non-transitory computer storage that storesexecutable program instructions is disclosed. The non-transitorycomputer storage may include instructions that, when executed by one ormore computing devices, configure the one or more computing devices toperform operations including: accessing a medical retinal image forenhancement, the medical retinal image related to a subject; computing amedian filtered image

with a median computed over a geometric shape, at single or multiplescales; determining whether intensity at a first pixel location in themedical retinal image I(x,y) is lower than intensity at a same positionin the median filtered image

(x,y) for generating an enhanced image; if the intensity at the firstpixel location is lower, then setting a value at the first pixellocation in the enhanced image to a value around a middle of a minimumand a maximum intensity value for the medical retinal image C_(mid)scaled by a ratio of intensity at medical retinal image to intensity inthe median filtered image as expressed by

${C_{mid} \cdot \frac{I( {x,y} )}{I_{{Back},}( {x,y} )}};$

and if the intensity at the first pixel location is not lower, thensetting a value at the first pixel location in the enhanced image to asum of around the middle of the minimum and the maximum intensity valuefor the medical retinal image, C_(mid), and (C_(mid)−1) scaled by aratio of a difference of intensity of the median filtered image fromintensity of the medical retinal original image to a difference ofintensity of the median filtered image from a maximum possible intensityvalue C_(max), expressed as

${C_{mid} + {( {C_{mid} - 1} ) \cdot \frac{{I( {x,y} )} - {I_{{Back},}( {x,y} )}}{C_{\max} - {I_{{Back},}( {x,y} )}}}};$

using the enhanced image to infer or further analyze, a medicalcondition of the subject.

In an additional embodiment, a computing system for automated detectionof active pixels in retinal images is disclosed. The computing systemmay include one or more hardware computer processors; and one or morestorage devices configured to store software instructions configured forexecution by the one or more hardware computer processors in order tocause the computing system to: access a retinal image; generate a firstmedian normalized image using the retinal image with a median computedover a first geometric shape of a first size; generate a second mediannormalized image using the retinal image with a median computed over thefirst geometric shape of a second size, the second size different fromthe first size; automatically generate a difference image by computing adifference between the first median normalized image and the secondmedian normalized image; generate a binary image by computing ahysteresis threshold of the difference image using at least twothresholds to detect dark and bright structures in the difference image;apply a connected component analysis to the binary image to groupneighboring pixels of the binary image into a plurality of localregions; compute the area of each local region in the plurality of localregions; and store the plurality of local regions in a memory of thecomputing system.

In a further embodiment, a computer-implemented method for automateddetection of active pixels in retinal images is disclosed. The methodmay include, as implemented by one or more computing devices configuredwith specific executable instructions: accessing a retinal image;generating a first median normalized image using the retinal image witha median computed over a first geometric shape of a first size;generating a second median normalized image using the retinal image witha median computed over the first geometric shape of a second size, thesecond size different from the first size; automatically generating adifference image by computing a difference between the first mediannormalized image and the second median normalized image; generating abinary image by computing a hysteresis threshold of the difference imageusing at least two thresholds to detect dark and bright structures inthe difference image; applying a connected component analysis to thebinary image to group neighboring pixels of the binary image into aplurality of local regions; computing the area of each local region inthe plurality of local regions; and storing the plurality of localregions in a memory.

In another embodiment, non-transitory computer storage that storesexecutable program instructions is disclosed. The non-transitorycomputer storage may include instructions that, when executed by one ormore computing devices, configure the one or more computing devices toperform operations including: accessing a retinal image; generating afirst median normalized image using the retinal image with a mediancomputed over a first geometric shape of a first size; generating asecond median normalized image using the retinal image with a mediancomputed over the first geometric shape of a second size, the secondsize different from the first size; automatically generating adifference image by computing a difference between the first mediannormalized image and the second median normalized image; generating abinary image by computing a hysteresis threshold of the difference imageusing at least two thresholds to detect dark and bright structures inthe difference image; applying a connected component analysis to thebinary image to group neighboring pixels of the binary image into aplurality of local regions; computing the area of each local region inthe plurality of local regions; and storing the plurality of localregions in a memory.

In an additional embodiment, a computing system for automated generationof descriptors of local regions within a retinal image is disclosed, thecomputing system may include one or more hardware computer processors;and one or more storage devices configured to store softwareinstructions configured for execution by the one or more hardwarecomputer processors in order to cause the computing system to: access aretinal image; generate a first morphological filtered image using theretinal image, with a the said morphological filter computed over afirst geometric shape; generate a second morphological filtered imageusing the retinal image, with a morphological filter computed over asecond geometric shape, the second geometric shape having one or more ofa different shape or different size from the first geometric shape;generate a difference image by computing a difference between the firstmorphological filtered image and the second morphological filteredimage; and assign the difference of image pixel values as a descriptorvalue, each descriptor value corresponding to given pixel location ofthe said retinal image.

In a further embodiment, a computer-implemented method for automatedgeneration of descriptors of local regions within a retinal image isdisclosed. The method may include, as implemented by one or morecomputing devices configured with specific executable instructions:accessing a retinal image; generating a first morphological filteredimage using the retinal image, with a the said morphological filtercomputed over a first geometric shape; generating a second morphologicalfiltered image using the retinal image, with a morphological filtercomputed over a second geometric shape, the second geometric shapehaving one or more of a different shape or different size from the firstgeometric shape; generating a difference image by computing a differencebetween the first morphological filtered image and the secondmorphological filtered image; and assigning the difference of imagepixel values as a descriptor value, each descriptor value correspondingto given pixel location of the said retinal image.

In another embodiment, non-transitory computer storage that storesexecutable program instructions is disclosed. The non-transitorycomputer storage may include instructions that, when executed by one ormore computing devices, configure the one or more computing devices toperform operations including: accessing a retinal image; generating afirst morphological filtered image using the retinal image, with a thesaid morphological filter computed over a first geometric shape;generating a second morphological filtered image using the retinalimage, with a morphological filter computed over a second geometricshape, the second geometric shape having one or more of a differentshape or different size from the first geometric shape; generating adifference image by computing a difference between the firstmorphological filtered image and the second morphological filteredimage; and assigning the difference of image pixel values as adescriptor value, each descriptor value corresponding to given pixellocation of the said retinal image.

In an additional embodiment, a computing system for automated processingof retinal images for screening of diseases or abnormalities isdisclosed. The computing system may include: one or more hardwarecomputer processors; and one or more storage devices configured to storesoftware instructions configured for execution by the one or morehardware computer processors in order to cause the computing system to:access retinal images related to a patient, each of the retinal imagescomprising a plurality of pixels; for each of the retinal images,designate a first set of the plurality of pixels as active pixelsindicating that they include interesting regions of the retinal image,the designating using one or more of: conditional number theory, single-or multi-scale interest region detection, vasculature analysis, orstructured-ness analysis; for each of the retinal images, computedescriptors from the retinal image, the descriptors including one ormore of: morphological filterbank descriptors, median filterbankdescriptors, oriented median filterbank descriptors, Hessian baseddescriptors, Gaussian derivatives descriptors, blob statisticsdescriptors, color descriptors, matched filter descriptors, path openingand closing based morphological descriptors, local binary patterndescriptors, local shape descriptors, local texture descriptors, localFourier spectral descriptors, localized Gabor jets descriptors, edgeflow descriptors, and edge descriptors such as difference of Gaussians,focus measure descriptors such as sum-modified Laplacian, saturationmeasure descriptors, contrast descriptors, or noise metric descriptors;and classify one or more of: a pixel in the plurality of pixels, aninteresting region within the image, the entire retinal image, or acollection of retinal images, as normal or abnormal using supervisedlearning utilizing the computed descriptors, using one or more of: asupport vector machine, support vector regression, k-nearest neighbor,naive Bayes, Fisher linear discriminant, neural network, deep learning,or convolution networks.

In a further embodiment, a computer implemented method for automatedprocessing of retinal images for screening of diseases or abnormalitiesis disclosed. The method may include: accessing retinal images relatedto a patient, each of the retinal images comprising a plurality ofpixels; for each of the retinal images, designating a first set of theplurality of pixels as active pixels indicating that they includeinteresting regions of the retinal image, the designating using one ormore of: conditional number theory, single- or multi-scale interestregion detection, vasculature analysis, or structured-ness analysis; foreach of the retinal images, computing descriptors from the retinalimage, the descriptors including one or more of: morphologicalfilterbank descriptors, median filterbank descriptors, oriented medianfilterbank descriptors, Hessian based descriptors, Gaussian derivativesdescriptors, blob statistics descriptors, color descriptors, matchedfilter descriptors, path opening and closing based morphologicaldescriptors, local binary pattern descriptors, local shape descriptors,local texture descriptors, local Fourier spectral descriptors, localizedGabor jets descriptors, edge flow descriptors, and edge descriptors suchas difference of Gaussians, focus measure descriptors such assum-modified Laplacian, saturation measure descriptors, contrastdescriptors, or noise metric descriptors; and classifying one or moreof: a pixel in the plurality of pixels, an interesting region within theimage, the entire retinal image, or a collection of retinal images, asnormal or abnormal using supervised learning utilizing the computeddescriptors, using one or more of: a support vector machine, supportvector regression, k-nearest neighbor, naive Bayes, Fisher lineardiscriminant, neural network, deep learning, or convolution networks.

In another embodiment, non-transitory computer storage that storesexecutable program instructions is disclosed. The non-transitorycomputer storage may include instructions that, when executed by one ormore computing devices, configure the one or more computing devices toperform operations including: accessing retinal images related to apatient, each of the retinal images comprising a plurality of pixels;for each of the retinal images, designating a first set of the pluralityof pixels as active pixels indicating that they include interestingregions of the retinal image, the designating using one or more of:conditional number theory, single- or multi-scale interest regiondetection, vasculature analysis, or structured-ness analysis; for eachof the retinal images, computing descriptors from the retinal image, thedescriptors including one or more of: morphological filterbankdescriptors, median filterbank descriptors, oriented median filterbankdescriptors, Hessian based descriptors, Gaussian derivativesdescriptors, blob statistics descriptors, color descriptors, matchedfilter descriptors, path opening and closing based morphologicaldescriptors, local binary pattern descriptors, local shape descriptors,local texture descriptors, local Fourier spectral descriptors, localizedGabor jets descriptors, edge flow descriptors, and edge descriptors suchas difference of Gaussians, focus measure descriptors such assum-modified Laplacian, saturation measure descriptors, contrastdescriptors, or noise metric descriptors; and classifying one or moreof: a pixel in the plurality of pixels, an interesting region within theimage, the entire retinal image, or a collection of retinal images, asnormal or abnormal using supervised learning utilizing the computeddescriptors, using one or more of: a support vector machine, supportvector regression, k-nearest neighbor, naive Bayes, Fisher lineardiscriminant, neural network, deep learning, or convolution networks.

In an additional embodiment, a computing system for automatedcomputation of image-based lesion biomarkers for disease analysis isdisclosed. The computing system may include: one or more hardwarecomputer processors; and one or more storage devices configured to storesoftware instructions configured for execution by the one or morehardware computer processors in order to cause the computing system to:access a first set of retinal images related to one or more visits froma patient, each of the retinal images in the first set comprising aplurality of pixels; access a second set of retinal images related to acurrent visit from the patient, each of the retinal images in the secondset comprising a plurality of pixels; perform lesion analysiscomprising: detecting interesting pixels; computing descriptors from theimages; and classifying active regions using machine learningtechniques; conduct image-to-image registration of a second image fromthe second set and a first image from the first set using retinal imageregistration, the registration comprising: identifying pixels in thefirst image as landmarks; identifying pixels in the second image aslandmarks; computing descriptors at landmark pixels; matchingdescriptors across the first image and the second image; and estimatinga transformation model to align the first image and the second image;compute changes in lesions and anatomical structures in registeredimages; and quantify the changes in terms of statistics, wherein thecomputed statistics represent the image-based biomarker that can be usedfor one or more of: monitoring progression, early detection, ormonitoring effectiveness of treatment or therapy.

In a further embodiment, a computer implemented method for automatedcomputation of image-based lesion biomarkers for disease analysis isdisclosed. The method may include: accessing a first set of retinalimages related to one or more visits from a patient, each of the retinalimages in the first set comprising a plurality of pixels; accessing asecond set of retinal images related to a current visit from thepatient, each of the retinal images in the second set comprising aplurality of pixels; performing lesion analysis comprising: detectinginteresting pixels; computing descriptors from the images; andclassifying active regions using machine learning techniques; conductingimage-to-image registration of a second image from the second set and afirst image from the first set using retinal image registration, theregistration comprising: identifying pixels in the first image aslandmarks; identifying pixels in the second image as landmarks;computing descriptors at landmark pixels; matching descriptors acrossthe first image and the second image; and estimating a transformationmodel to align the first image and the second image; computing changesin lesions and anatomical structures in registered images; andquantifying the changes in terms of statistics, wherein the computedstatistics represent the image-based biomarker that can be used for oneor more of: monitoring progression, early detection, or monitoringeffectiveness of treatment or therapy.

In another embodiment, non-transitory computer storage that storesexecutable program instructions is disclosed. The non-transitorycomputer storage may include instructions that, when executed by one ormore computing devices, configure the one or more computing devices toperform operations including: accessing a first set of retinal imagesrelated to one or more visits from a patient, each of the retinal imagesin the first set comprising a plurality of pixels; accessing a secondset of retinal images related to a current visit from the patient, eachof the retinal images in the second set comprising a plurality ofpixels; performing lesion analysis comprising: detecting interestingpixels; computing descriptors from the images; and classifying activeregions using machine learning techniques; conducting image-to-imageregistration of a second image from the second set and a first imagefrom the first set using retinal image registration, the registrationcomprising: identifying pixels in the first image as landmarks;identifying pixels in the second image as landmarks; computingdescriptors at landmark pixels; matching descriptors across the firstimage and the second image; and estimating a transformation model toalign the first image and the second image; computing changes in lesionsand anatomical structures in registered images; and quantifying thechanges in terms of statistics, wherein the computed statisticsrepresent the image-based biomarker that can be used for one or more of:monitoring progression, early detection, or monitoring effectiveness oftreatment or therapy.

In an additional embodiment, a computing system for identifying thequality of an image to infer its appropriateness for manual or automaticgrading id disclosed. The computing system may include: one or morehardware computer processors; and one or more storage devices configuredto store software instructions configured for execution by the one ormore hardware computer processors in order to cause the computing systemto: access a retinal image related to a subject; automatically computedescriptors from the retinal image, the descriptors comprising a vectorof a plurality of values for capturing a particular quality of an imageand including one or more of: focus measure descriptors, saturationmeasure descriptors, contrast descriptors, color descriptors, texturedescriptors, or noise metric descriptors; and use the descriptors toclassify image suitability for grading comprising one or more of:support vector machine, support vector regression, k-nearest neighbor,naive Bayes, Fisher linear discriminant, neural network, deep learning,or convolution networks.

In a further embodiment, a computer implemented method for identifyingthe quality of an image to infer its appropriateness for manual orautomatic grading. The method may include: accessing a retinal imagerelated to a subject; automatically computing descriptors from theretinal image, the descriptors comprising a vector of a plurality ofvalues for capturing a particular quality of an image and including oneor more of: focus measure descriptors, saturation measure descriptors,contrast descriptors, color descriptors, texture descriptors, or noisemetric descriptors; and using the descriptors to classify imagesuitability for grading comprising one or more of: support vectormachine, support vector regression, k-nearest neighbor, naive Bayes,Fisher linear discriminant, neural network, deep learning, orconvolution networks.

In another embodiment, non-transitory computer storage that storesexecutable program instructions is disclosed. The non-transitorycomputer storage may include instructions that, when executed by one ormore computing devices, configure the one or more computing devices toperform operations including: accessing a retinal image related to asubject; automatically computing descriptors from the retinal image, thedescriptors comprising a vector of a plurality of values for capturing aparticular quality of an image and including one or more of: focusmeasure descriptors, saturation measure descriptors, contrastdescriptors, color descriptors, texture descriptors, or noise metricdescriptors; and using the descriptors to classify image suitability forgrading comprising one or more of: support vector machine, supportvector regression, k-nearest neighbor, naive Bayes, Fisher lineardiscriminant, neural network, deep learning, or convolution networks.

In one embodiment of the system, a retinal fundus image is acquired froma patient, then active or interesting regions comprising active pixelsfrom the image are determined using multi-scale background estimation.The inherent scale and orientation at which these active pixels aredescribed is determined automatically. A local description of the pixelsmay be formed using one or more of median filterbank descriptors, shapedescriptors, edge flow descriptors, spectral descriptors, mutualinformation, or local texture descriptors. One embodiment of the systemprovides a framework that allows computation of these descriptors atmultiple scales. In addition, supervised learning and classification canbe used to obtain a prediction for each pixel for each class of lesionor retinal anatomical structure, such as optic nerve head, veins,arteries, and/or fovea. A joint segmentation-recognition method can beused to recognize and localize the lesions and retinal structures. Inone embodiment of the system, this lesion information is furtherprocessed to generate a prediction score indicating the severity ofretinopathy in the patient, which provides context determining potentialfurther operations such as clinical referral or recommendations for thenext screening date. In another embodiment of the system, the automateddetection of retinal image lesions is performed using images obtainedfrom prior and current visits of the same patient. These images may beregistered using the disclosed system. This registration allows for thealignment of images such that the anatomical structures overlap, and forthe automated quantification of changes to the lesions. In addition,system may compute quantities including, but not limited to, appearanceand disappearance rates of lesions (such as microaneurysms), andquantification of changes in number, area, perimeter, location, distancefrom fovea, or distance from optic nerve head. These quantities can beused as image-based biomarker for monitoring progression, earlydetection, or evaluating efficacy of treatment, among many other uses.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows one embodiment in which retinal image analysis can beapplied.

FIG. 2 illustrates various embodiments of an image enhancement systemand process.

FIG. 3 is a block diagram of one embodiment for computing an enhancedimage of an input retinal image.

FIGS. 4A and 4C show examples of embodiments of retinal images taken ontwo different retinal devices.

FIGS. 4B and 4D show examples of embodiments of median normalizedimages.

FIGS. 4E and 4F demonstrate an example of embodiments of improved lesionand vessel visibility after image enhancement.

FIGS. 5A and 5B show examples of embodiments of retinal images.

FIGS. 5C and 5D show examples of embodiments of a retinal fundus mask.

FIGS. 6A and 6B show an example of embodiments of before and after noiseremoval.

FIG. 7A is a block diagram of one embodiment of a system for identifyingimage regions with similar properties across multiple images.

FIG. 7B is a block diagram of one embodiment of a system for identifyingan encounter level fundus mask.

FIGS. 8A, 8B, 8C, and 8D show examples of embodiments of retinal imagesfrom a single patient encounter.

FIGS. 8E, 8F, 8G, and 8H show examples of embodiments of a retinalimage-level fundus mask.

FIGS. 8I, 8J, 8K, and 8L show examples of embodiments of a retinalencounter-level fundus mask.

FIG. 9A depicts one embodiment of a process for lens dust artifactdetection.

FIGS. 9B, 9C, 9D, and 9E are block diagrams of image processingoperations used in an embodiment of lens dust artifact detection.

FIGS. 10A, 10B, 10C, and 10D show embodiments of retinal images fromencounters with lens dust artifact displayed in the insets.

FIG. 10E shows an embodiment of an extracted lens dust binary mask usingan embodiment of lens dust artifact detection.

FIGS. 10F, 10G, 10H, and 10I show embodiments of retinal images from oneencounter with lens dust artifact displayed in the inset.

FIG. 10J shows an embodiment of an extracted lens dust binary mask usingan embodiment of lens dust artifact detection.

FIGS. 10K, 10L, 10M, and 10N show embodiments of retinal images from oneencounter with lens dust artifact displayed in the inset.

FIG. 10O shows an extracted lens dust binary mask using an embodiment oflens dust artifact detection.

FIG. 11 is a block diagram of one embodiment for evaluating an interestregion detector at a particular scale.

FIG. 12A shows one embodiment of an example retinal fundus image.

FIG. 12B shows one embodiment of an example of interest region detectionfor the image in FIG. 12A using one embodiment of the interest regiondetection block.

FIG. 13A is a block diagram of one embodiment of registration oralignment of a given pair of images.

FIG. 13B is a block diagram of one embodiment of computation ofdescriptors for registering two images.

FIG. 14 shows embodiments of an example of keypoint matches used fordefining a registration model, using one embodiment of the imageregistration module.

FIG. 15 shows embodiments of an example set of registered images usingone embodiment of the image registration module.

FIG. 16 shows example embodiments of lens shot images.

FIG. 17 illustrates various embodiments of an image quality analysissystem and process.

FIG. 18 is a block diagram of one embodiment for evaluating gradabilityof a given retinal image.

FIG. 19 shows one embodiment of example vessel enhancement imagescomputed using one embodiment of the vesselness computation block.

FIG. 20A shows an example of embodiments of visibility of retinal layersin different channels of an image-color fundus image.

FIG. 20B shows one embodiment of a red channel of a retinal imagedisplaying vasculature from the choroidal layers.

FIG. 20C shows one embodiment of a green channel of a retinal imagewhich captures the retinal vessels and lesions.

FIG. 20D shows one embodiment of a blue channel of a retinal image whichdoes not capture much retinal image information.

FIG. 21 shows an example of one embodiment of an automatic image qualityassessment with a quality score output overlaid on retinal images.

FIG. 22 is a block diagram of one embodiment for generating a vesselenhanced image.

FIG. 23 shows one embodiment of a receiver operating characteristics(ROC) curve for vessel classification obtained using one embodiment of avesselness computation block on a STARE (Structured Analysis of theRetina) dataset.

FIG. 24 shows one embodiment of images generated using one embodiment ofa vesselness computation block.

FIG. 25 is a block diagram of one embodiment of a setup to localizelesions in an input retinal image.

FIG. 26A shows one embodiment of an example of microaneurysmslocalization.

FIG. 26B shows one embodiment of an example of hemorrhages localization.

FIG. 26C shows one embodiment of an example of exudates localization.

FIG. 27 shows one embodiment of a graph demonstrating performance of oneembodiment of the lesion localization module in terms of free responseROC plots for lesion detection.

FIG. 28 illustrates various embodiments of a lesion dynamics analysissystem and process.

FIG. 29A depicts an example of one embodiment of a user interface of atool for lesion dynamics analysis depicting persistent, appeared, anddisappeared lesions.

FIG. 29B depicts an example of one embodiment of a user interface of atool for lesion dynamics analysis depicting plots of lesion turnover.

FIG. 29C depicts an example of one embodiment of a user interface of atool for lesion dynamics analysis depicting overlay of the longitudinalimages.

FIG. 30 is a block diagram of one embodiment for evaluating longitudinalchanges in lesions.

FIGS. 31A and 31B show patches of aligned image patches from twolongitudinal images.

FIGS. 31C and 31D show persistent microaneurysms (MAs) along with thenew and disappeared MAs.

FIG. 32A shows a patch of an image with MAs.

FIG. 32B shows ground truth annotations marking MAs.

FIG. 32C shows MAs detected by one embodiment with a confidence of theestimate depicted by the brightness of the disk.

FIG. 33A shows embodiments of local registration refinement withbaseline and month 6 images registered and overlaid.

FIG. 33B shows embodiments of local registration refinement withbaseline image, and enhanced green channel when the dotted box shows aregion centered on the detected microaneurysm, and with an inset showinga zoomed version.

FIG. 33C shows embodiments of local registration refinement with a month6 image, enhanced green channel, the new lesion location afterrefinement correctly identified as persistent.

FIG. 34A shows embodiments of microaneurysms turnover (or appearance)rates ranges, number of MAs per year, computed (in gray), and groundtruth values (black circles) for various images in a dataset.

FIG. 34B shows embodiments of microaneurysms turnover (or disappearance)rates ranges, number of MAs per year, computed (in gray), and groundtruth values (black circles) for various images in a dataset.

FIG. 35 illustrates various embodiments of an image screening system andprocess.

FIG. 36A depicts an example of one embodiment of a user interface of atool for screening for a single encounter.

FIG. 36B depicts an example of one embodiment of a user interface of atool for screening with detected lesions overlaid on an image.

FIG. 36C depicts an example of one embodiment of a user interface of atool for screening for multiple encounters.

FIG. 36D depicts an example of one embodiment of a user interface of atool for screening for multiple encounters with detected lesionsoverlaid on an image.

FIG. 37 is a block diagram of one embodiment that indicates evaluationof descriptors at multiple levels.

FIG. 38 is a block diagram of one embodiment of screening for retinalabnormalities associated with diabetic retinopathy.

FIG. 39 shows an embodiment of a ROC plot for one embodiment ofscreening classifier with a 50/50 train-test split.

FIG. 40 shows an embodiment of a ROC plot for one embodiment on entiredataset with cross dataset training.

FIGS. 41A and 41B show embodiments of Cytomegalovirus retinitisscreening results using one embodiment of the Cytomegalovirus retinitisdetection module for “normal retina” category screened as “no refer”.

FIGS. 41C and 41D show embodiments of Cytomegalovirus retinitisscreening results using one embodiment of the Cytomegalovirus retinitisdetection module for “retina with CMVR” category screened as “refer”.

FIGS. 41E and 41F show embodiments of Cytomegalovirus retinitisscreening results using one embodiment of the Cytomegalovirus retinitisdetection module for “cannot determine” category screened as “refer”.

FIG. 42 is a block diagram of one embodiment of screening for retinalabnormalities associated with Cytomegalovirus retinitis.

FIG. 43A outlines the operation of one embodiment of an Image AnalysisSystem-Picture Archival and Communication System Application ProgramInterface (API).

FIG. 43B outlines the operation of an additional API.

FIG. 44 illustrates various embodiments of a cloud-based analysis andprocessing system and process.

FIG. 45 illustrates architectural details of one embodiment of acloud-based analysis and processing system.

FIG. 46 is a block diagram showing one embodiment of an imaging systemto detect diseases.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS I. Rapid Increase inRetinal Disease

Retinal diseases in humans can be manifestations of differentphysiological or pathological conditions such as diabetes that causesdiabetic retinopathy, cytomegalovirus that causes retinitis inimmune-system compromised patients with HIV/AIDS, intraocular pressurebuildup that results in optic neuropathy leading to glaucoma,age-related degeneration of macula seen in seniors, and so forth. Oflate, improved longevity and “stationary”, stress-filled lifestyles haveresulted in a rapid increase in the number of patients suffering fromthese vision threatening conditions. There is an urgent need for alarge-scale improvement in the way in which these diseases are screened,diagnosed, and treated.

Diabetes mellitus (DM), in particular, is a chronic disease whichimpairs the body's ability to metabolize glucose. Diabetic retinopathy(DR) is a common microvascular complication of diabetes, in whichdamaged retinal blood vessels become leaky or occluded, leading tovision loss. Clinical trials have demonstrated that early detection andtreatment of DR can reduce vision loss by 90%. Despite its preventablenature, DR is the leading cause of blindness in the adult working agepopulation. Technologies that allow early screening of diabetic patientswho are likely to progress rapidly would greatly help reduce the tolltaken by this blinding eye disease. This is especially important becauseDR progresses without much pain or discomfort until the patient suffersactual vision loss, at which point it is often too late for effectivetreatment. Worldwide, 371 million people suffer from diabetes and thisnumber is expected to grow to half a billion by 2030. The currentclinical guideline is to recommend annual DR screening for everyonediagnosed with diabetes. However, the majority of diabetics do not gettheir annual screening, for many reasons, including lack of access toophthalmology clinicians, lack of insurance, or lack of education. Evenif the patients have knowledge and experience, the number of cliniciansscreening for DR is an order of magnitude less than that required toscreen the current diabetic population. This is as true for first worldcountries, including America and Europe, as it is for the developingworld. The exponentially growing need for DR screening can be meteffectively by a computer-aided DR screening system, provided it isrobust, scalable, and fast.

For effective DR screening of diabetics, telescreening programs arebeing implemented worldwide. These programs use fundus photography,using a fundus camera typically deployed at a primary care facilitywhere the diabetic patients normally go for monitoring and treatment.Such telemedicine programs significantly help in expanding the DRscreening but are still limited by the need for human grading, of thefundus photographs, which is typically performed at a reading center.

II. High-Level Overview of an Automated Imaging System

Methods and systems are disclosed that provide automated image analysisallowing detection, screening, and/or monitoring of retinalabnormalities, including diabetic retinopathy, macular degeneration,glaucoma, retinopathy of prematurity, cytomegalovirus retinitis, andhypertensive retinopathy.

In some embodiments, the methods and systems can be used to conductautomated screening of patients with one or more retinal diseases. Inone embodiment, this is accomplished by first identifying interestingregions in an image of a patient's eye for further analysis, followed bycomputation of a plurality of descriptors of interesting pixelsidentified within the image. In this embodiment, these descriptors areused for training a machine learning algorithm, such as support vectormachine, deep learning, neural network, naive Bayes, and/or k-nearestneighbor. In one embodiment, these classification methods are used togenerate decision statistics for each pixel, and histograms for thesepixel-level decision statistics are used to train another classifier,such as one of those mentioned above, to allow screening of one or moreimages of the patient's eye. In one embodiment, a dictionary ofdescriptor sets is formed using a clustering method, such as k-means,and this dictionary is used to form a histogram of codewords for animage. In one embodiment, the histogram descriptors are combined withthe decision statistics histogram descriptors before trainingimage-level, eye-level, and/or encounter-level classifiers. In oneembodiment, multiple classifiers are each trained for specific lesiontypes and/or for different diseases. A score for a particular elementcan be generated by computing the distance of the given element from theclassification boundary. In one embodiment, the screening system isfurther included in a telemedicine system, and the screening score ispresented to a user of the telemedicine system.

The methods and systems can also be used to conduct automatedidentification and localization of lesions related to retinal diseases,including but not limited to diabetic retinopathy, macular degeneration,retinopathy of prematurity, or cytomegalovirus retinitis.

The methods and systems can also be used to compute biomarkers forretinal diseases based on images taken at different time intervals, forexample, approximately once every year or about six months. In oneembodiment, the images of a patient's eye from different visits areco-registered. The use of a lesion localization module allows for thedetection of lesions as well as a quantification of changes in thepatient's lesions over time, which is used as an image-based biomarker.

The methods and systems can also be used to conduct co-registration ofretinal images. In one embodiment, these images could be of differentfields of the eye, and in another embodiment these images could havebeen taken at different times.

The methods and systems can also be used to enhance images to make iteasier to visualize the lesions by a human observer or for analysis byan automated image analysis system.

FIG. 1 shows one embodiment in which retinal image analysis is applied.In this embodiment, the patient 19000 is imaged using a retinal imagingsystem 19001. The image/images 19010 captured are sent for processing ona computing cloud 19014, a computer or computing system 19004, or amobile device 19008. The results of the analysis are sent back to thehealth professional 19106 and/or to the retinal imaging system 19001.

The systems and methods disclosed herein include an automated screeningsystem that processes automated image analysis algorithms that canautomatically evaluate fundus photographs to triage patients with signsof diabetic retinopathy (DR) and other eye diseases. An automatedtelescreening system can assist an at-risk population by helping reducethe backlog in one or more of the following ways.

-   -   Seamlessly connecting primary care facilities with image reading        centers, so that an expert is not needed at the point of care;    -   Re-prioritizing expert appointments, so patients at greater risk        can be seen immediately by ophthalmologists;    -   Allowing primary care physicians and optometrists to use the        tools to make informed decisions regarding disease care; or    -   Improving patient awareness through visualization tools based on        lesion detection and localization.

For example, to screen an estimated 371 million diabetics worldwide, andto scale the screening operation as the diabetic population grows toover half a billion by 2030, one embodiment of the automated screeningsystem can be deployed at massive scales. At these numbers, it isrecognized that automation is not simply a cost-cutting measure to savethe time spent by the ophthalmologists, but rather it is the onlyrealistic way to screen such large, growing, patient population.

The critical need for computerized retinal image screening has resultedin numerous academic and a few commercial efforts at addressing theproblem of identifying and triaging patients with retinal diseases usingautomatic analysis of fundus photographs. For successful deployment,automated screening systems may include one or more of the followingfeatures:

i. High Sensitivity at a Reasonably High Specificity

For automated telescreening to gain acceptance among clinicians andadministrators, the accuracy, sensitivity and specificity should be highenough to match trained human graders, though not necessarily retinaexperts. Studies suggest that sensitivity of 85%, with high enoughspecificity, is a good target but other sensitivity levels may beacceptable.

ii. Invariance to the Training Data

Many prior approaches work by using algorithms that learn, directly orindirectly, from a set of examples of already graded fundus images. Thistraining data could have a key influence on the sensitivity andspecificity of the algorithm. An algorithm whose behavior variessignificantly between datasets is not preferred in some embodiments.Instead, in some embodiments, the computerized screening algorithmperforms well on cross-dataset testing, that is, the algorithmgeneralizes well, when trained on one dataset and tested on another.Hence, what is sometimes desired is a system that can generalize in arobust fashion, performing well in a cross-dataset testing scenario.

iii. Robustness Against Varying Conditions

In a deployed setup, an algorithm does not have control over the make ormodel of the camera, the illumination, the skill-level of thetechnician, or the size of the patient's pupil. Hence, in someembodiments, a computerized retinal disease screening system isconfigured to work in varying imaging conditions.

iv. Scalability to Massive Screening Setups:

In some embodiments, a screening system processes and grades large,growing databases of patient images. The speed at which the algorithmperforms grading can be important. In addition, testing time for a newimage to be screened remains constant even as the database grows, suchthat it does not take longer to screen a new test image as the databasesize increases as more patients are screened. What is sometimes desiredis a method that takes a constant time to evaluate a new set of patientimages even as the database size grows.

v. Interoperability with Existing Systems and Software:

In some embodiments, the system does not disrupt the existing workflowthat users are currently used to. This means that the systeminter-operates with a variety of existing software. What is sometimesdesired is a system that can be flexibly incorporated into existingsoftware and devices.

Customized methods for low-level description of medical imagecharacteristics that can lead to accuracy improvement is anotherpotential feature. Furthermore, approaches that leverage informationsuch as local scale and orientation within local image regions inmedical images, leading to greater accuracy in lesion detection couldalso provide many benefits.

In addition, the availability of an effective biomarker, a measurablequantity that correlates with the clinical progression of the diseaseand greatly enhances the clinical care available to the patients. Itcould also positively impact drug research, facilitating early andreliable determination of biological efficacy of potential newtherapies. It will be a greatly added benefit if the biomarker is basedonly on images, which would lead to non-invasive and inexpensivetechniques. Because retinal vascular changes often reflect or mimicchanges in other end organs, such as the kidney or the heart, thebiomarker may also prove to be a valuable assay of the overall systemicvascular state of a patient with diabetes.

Lesion dynamics, such as microaneurysm (MA) turnover, have received lessattention from academia or industry. Thus, a system that improves thelesion detection and localization accuracy could be beneficial.Furthermore, a system and method for computation of changes in retinalimage lesions over successive visits would also be of value by leadingto a variety of image-based biomarkers that could help monitor theprogression of diseases.

Certain aspects, advantages, and novel features of the systems andmethods have been and are described herein. It is to be understood thatnot necessarily all such advantages or features may be achieved inaccordance with any particular embodiment. Thus, for example, thoseskilled in the art will recognize that the systems and methods may beembodied or carried out in a manner that achieves one advantage/featureor group of advantages/features as taught herein without necessarilyachieving other advantages/features as may be taught or suggestedherein.

III. Automated Low-Level Image Processing

In some embodiments, the systems and methods provide for variousfeatures of automated low-level image processing, which may includeimage enhancement or image-level processing blocks.

A. Image Enhancement

In some embodiments, the system may also make it easier for a human oran automated system to evaluate a retinal image and to visualize andquantify retinal abnormalities. Retinal fundus images can be acquiredfrom a wide variety of cameras, under varying amounts of illumination,by different technicians, and on different people. From an imageprocessing point of view, these images have different colors levels,different dynamic ranges, and different sensor noise levels. This makesit difficult for a system to operate on these images using the sameparameters. Human image graders or experts may also find it a hindrancethat the images often look very different overall. Therefore, in someembodiments, the image enhancement process applies filters on the imagesto enhance them in such a way that their appearance is neutralized.After this image enhancement processing, the enhanced images can beprocessed by the same algorithms using identical or substantiallysimilar parameters.

FIG. 2 shows one embodiment of a detailed view of the differentscenarios in which image enhancement can be applied. In one scenario,the patient 29000 is imaged by an operator 29016 using an image capturedevice 29002. In this embodiment, the image capture device is depictedas a retinal camera. The images captured are sent to a computer orcomputing system 29004 for image enhancement. Enhanced images 29202 arethen sent for viewing or further processing on the cloud 19014, or acomputer or computing device 19004 or a mobile device 19008. In anotherembodiment, the images 29004 could directly be sent to the cloud 19014,the computer or computing device 19004, or the mobile device 19008 forenhancement and/or processing. In the second scenario, the patient 29000may take the image himself using an image capture device 29006, which inthis case is shown as a retinal camera attachment for a mobile device29008. The image enhancement is then performed on the mobile device29008. Enhanced images 29204 can then be sent for viewing or furtherprocessing.

FIG. 3 gives an overview of one embodiment of computing an enhancedimage. The blocks shown here may be implemented in the cloud 19014, on acomputer or computing system 19004, or a mobile device 19008, or thelike. The image 100 refers in general to the retinal data, single ormultidimensional, that has been captured using a retinal imaging device,such as camera for color image capture, fluorescein angiography (FA),adaptive optics, optical coherence tomography (OCT), hyperspectralimaging, scanning laser ophthalmoscope (SLO), wide-field imaging orultra-wide-field imaging. Background estimation block 800 estimates thebackground of the image 100 at a given scale. Adaptive intensity scaling802 is then applied to scale the image intensity based on localbackground intensity levels. Image enhancement module 106 enhances theimage to normalize the effects of lighting, different cameras, retinalpigmentation and the like. An image is then created thatexcludes/ignores objects smaller than a given size.

In one embodiment, the images are first subjected to an edge-preservingbilateral filter such as the filter disclosed in Carlo Tomasi andRoberto Manduchi, “Bilateral Filtering for Gray and Color Images,” inComputer Vision, 1998. Sixth International Conference on, 1998, 839-846;and Ben Weiss, “Fast Median and Bilateral Filtering,” in ACMTransactions on Graphics (TOG), vol. 25, 2006, 519-526. The filterremoves noise without affecting important landmarks such as lesions andvessels.

In one embodiment, the system then uses a median filter-basednormalization technique, referred to as median normalization, to locallyenhance the image at each pixel using local background estimation. Insome embodiments, the median normalized image intensity

at pixel location (x,y) is computed as:

$\begin{matrix}{{I_{{Norm},}( {x,y} )} = \{ \begin{matrix}{C_{mid} + {( {C_{mid} - 1} ) \cdot \frac{{I( {x,y} )} - {I_{{Back},}( {x,y} )}}{C_{\max} - {I_{{Back},}( {x,y} )}}}} & {{{{if}\mspace{14mu} {I( {x,y} )}} \geq {I_{{Back},}( {x,y} )}},} \\{C_{mid} \cdot \frac{I( {x,y} )}{I_{{Back},}( {x,y} )}} & {otherwise}\end{matrix} } & {{Equation}\mspace{14mu} 1}\end{matrix}$

where I is the input image with pixel intensities in the range [C_(min),C_(max)]=[0, 2^(B)−1], B is the image bit-depth,

is background image obtained using a median filter over the area

, and C_(mid)=2^(B−1) is the “middle” gray pixel intensity value inimage I. For an 8-bit image, [C_(min), C_(max)]=[0,255], andC_(mid)=128. In one embodiment,

is chosen to be a circle of radius r=100. In one embodiment, a circle, asquare, or a regular polygon is used. In addition, a square maybe usedwith a pre-defined size.

FIGS. 4B and 4D show embodiments of some example median normalizedimages for the input images shown in FIG. 4A and FIG. 4C respectively.Note that this normalization improves the visibility of structures suchas lesions and vessels in the image as shown in FIG. 4E and FIG. 4F. Theinset in FIGS. 4E and 4F show the improved visibility of microaneurysmlesions. The results of this image enhancement algorithm have also beenqualitatively reviewed by retina experts at Doheny Eye Institute, andthey concur with the observations noted here. The effectiveness of thisalgorithm is demonstrated by superior cross-dataset performance of thesystem described below in the section entitled “Screening Using LesionClassifiers Trained On Another Dataset (Cross-Dataset Testing).”

B. Image-Level Processing

1. Image-Level Fundus Mask Generation

Typically, retinal fundus photographs have a central circular region ofthe eye visible, with a dark border surrounding it. Sometimesinformation pertaining to the patient, or the field number may also beembedded in the corners of the photograph. For retinal image analysis,these border regions of the photograph do not provide any usefulinformation and therefore it is desirable to ignore them. In oneembodiment, border regions of the retinal photographs are automaticallyidentified using morphological filtering operations as described below.

In one embodiment, the input image is first blurred using a medianfilter. A binary mask is then generated by thresholding this image sothat locations with pixel intensity values above a certain threshold areset to 1 in the mask, while other areas are set to 0. The threshold isempirically chosen so as to nullify the pixel intensity variations inthe border regions, so that they go to 0 during thresholding. In oneembodiment, this threshold is automatically estimated. The binary maskis then subjected to region dilation and erosion morphologicaloperations, to obtain the final mask. In one embodiment, the medianfilter uses a radius of 5 pixels, and, the threshold for binary maskgeneration is 15 for an 8-bit image with pixel values ranging from[0,255], though other radii and thresholds can be used. The dilation anderosion operations can be performed using rectangular structuringelements, such as, for example, size 10 and 20 pixels respectively. FIG.5A and FIG. 5B show two different retinal image types, and FIG. 5C andFIG. 5D show embodiments of fundus masks for these two images generatedusing the above described embodiment.

2. Optic Nerve Head Detection

In some embodiments, it may be beneficial to detect the optic nerveheard (ONH) within a retinal image. A ONH can be robustly detected usingan approach that mirrors the one for lesions as described in sectionbelow entitled “Lesion Localization”. In another embodiment,multi-resolution decomposition and template matching is employed for ONHlocalization.

In one embodiment, the ONH localization is performed on a fullresolution retinal fundus image, or a resized version of the image, orthe image (full or resized) processed using one or more morphologicalfilters that can be chosen from minimum filter or maximum filter,dilation filter, morphological wavelet filter, or the like. Anapproximate location of the ONH is first estimated in the horizontaldirection by filtering horizontal strips of the image whose height isequal to the typical ONH diameter and width is equal to the image width,with a filter kernel of size approximately equal to the typical ONHsize. The filter kernel can be: a circle of specific radius, square ofspecific side and orientation, Gaussian of specific sigmas (that is,standard deviations), ellipse of specific orientation and axes,rectangle of specific orientation and sides, or a regular polygon ofspecific side. The filtered image strips are converted to aone-dimensional signal by collating the data along the verticaldimension by averaging or taking the maximum or minimum or the like. Thelargest N local maxima of the one-dimensional signal whose spatiallocations are considerably apart are considered as likely horizontallocations of the ONH since the ONH is expected to be a bright region. Ina similar fashion, the vertical position of the ONH is approximated byexamining vertical image strips centered about the N approximatehorizontal positions. This ONH position approximation technique producesM approximate locations for the ONH.

In one embodiment, the approximate sizes or radii of the possible ONHscan be estimated by using a segmentation algorithm such as themarker-controlled watershed algorithm. In one embodiment the markers areplaced based on the knowledge of the fundus mask and approximate ONHlocation. In another embodiment, typical ONH sizes or radii can also beused as approximate ONH sizes or radii.

In one embodiment, these approximate locations and sizes for the ONH canbe refined by performing template matching in a neighborhood about theseapproximate ONH locations and choosing the one location and size thatgives the maximum confidence or probability of ONH presence.

In another embodiment, the ONH position can be estimated as the vertexof the parabola approximation to the major vascular arch.

3. Image Size Standardization

Different retinal fundus cameras capture images at varying resolutionsand field of view. In order to process these different resolution imagesusing the other blocks, in one embodiment the images are standardized byscaling them to have identical or near identical pixel pitch. The pixelpitch is computed using the resolution of the image and field of viewinformation from the metadata. In one embodiment, if a field of viewinformation is absent, then the pixel pitch is estimated by measuringthe optic nerve head (ONH) size in the image as described in the sectionabove entitled “Optic Nerve Head Detection.” In one embodiment, anaverage ONH size of 2 mm can be used. The image at the end of sizestandardization is referred to as I^(s) ⁰ . The fundus mask is generatedfor I^(s) ⁰ and can be used for further processing. In anotherembodiment, the diameter of the fundus mask is used as a standardquantity for the pitch. The diameter may be calculated as described inthe section above entitled “Image-Level Fundus Mask Generation” or inthe section below entitled “Encounter-Level Fundus Mask Generation.”

4. Noise Removal

Fundus images usually have visible sensor noise that can potentiallyhamper lesion localization or detection. In order to reduce the effectof noise while preserving lesion and vessel structures, in oneembodiment a bilateral filter may be used, such as, for example, thefilter disclosed in Tomasi and Manduchi, “Bilateral Filtering for Grayand Color Images”, and Weiss, “Fast Median and Bilateral Filtering.”Bilateral filtering is a normalized convolution operation in which theweighting for each pixel p is determined by the spatial distance fromthe center pixel s, as well as its relative difference in intensity. Inone embodiment, for input image I, output image J, and window Ω, thebilateral filtering operation is defined as follows:

$J_{s} = {\sum\limits_{p \in \Omega}{{f( {p - s} )}{g( {I_{p} - I_{s}} )}I_{p}\text{/}{\sum\limits_{p \in \Omega}{{f( {p - s} )}{g( {I_{p} - I_{s}} )}}}}}$

where f and g are the spatial and intensity weighting functionsrespectively, which are typically Gaussian. In one embodiment, theparameters of the bilateral filter have been chosen to induce thesmoothing effect so as not to miss small lesions such as microaneurysms.FIG. 6A shows one embodiment of an enlarged portion of a retinal imagebefore noise removal and FIG. 6B shows one embodiment of the sameportion after noise removal. It can be observed that the sensor noise isgreatly suppressed while preserving lesion and vessel structures.

C. Encounter-Level Processing

While capturing images using commercial cameras, retinal cameras, ormedical imaging equipment, several images could be captured in a shortduration of time without changing the imaging hardware. These imageswill have certain similar characteristics that can be utilized forvarious tasks, such as image segmentation, detection, or analysis.However, the images possibly may have different fields of view orillumination conditions.

In particular, medical or retinal images captured during a patient visitare often captured using the same imaging set-up. The set of theseimages is termed an “encounter” of that patient on that date. For thespecific case of retinal images, data from multiple images in anencounter can be used to produce fundus segmentation masks and detectimage artifacts due to dust or blemishes as described in the sectionsthat follow.

1. Encounter-Level Fundus Mask Generation

Many medical images such as those acquired using ultrasound equipmentand those of the retina have useful information only in a portion of therectangular image. In particular, most retinal fundus photographs have acentral circle-like region of the eye visible, with the remainder of thephotograph being dark. Information pertaining to the patient or thefield number may be embedded in the regions of the photograph that donot contain useful image information. Therefore, before analysis of suchphotographs, it is desirable to identify regions of the photographs withuseful image information using computer-aided processes and algorithms.One benefit of such identification is that it reduces the chances offalse positives in the border regions. Additionally, this identificationcan reduce the analysis complexity and time for these images since asubset of pixels in the photographs is to be processed and analyzed.

FIG. 7A depicts one embodiment of an algorithmic framework to determineregions without useful image information from images captured during anencounter. The illustrated blocks may be implemented on the cloud 19014,a computer or computing device 19004, a mobile device 19008, the like asshown in FIG. 1. This analysis may be helpful when regions with usefulinformation are sufficiently different across the images in an encountercompared to the outside regions without useful information. The N images74802 in the encounter are denoted as I⁽¹⁾, I⁽²⁾ . . . I^((N)). Theregions that are similar across the images in the encounter aredetermined as those pixel positions where most of the pair-wisedifferences 74804 are small in magnitude 74806. The regions that aresimilar across most of the N images in the encounter include regionswithout useful image information. However, these regions also includeportions of the region with useful image information that are alsosimilar across most of the images in the encounter. Therefore, toexclude such similar regions with useful information, additionalconstraints 74808 can be included and logically combined 74810 with theregions determined to be similar and obtain the fundus mask 74812. Forexample, regions outside the fundus portion of retinal images usuallyhave low pixel intensities and can be used to determine which region toexclude.

FIG. 7B depicts one embodiment of an algorithmic framework thatdetermines a fundus mask for the retinal images in an encounter. In oneembodiment, the encounter-level fundus mask generation may besimplified, with low loss in performance by using only the red channelof the retinal photographs denoted as I^((1),r), I^((2),r) . . .I^((N),r) 74814. This is because in most retinal photographs, the redchannel has very high pixel values within the fundus region and smallpixel values outside the fundus region. The noise may be removed fromthe red channels of the images in an encounter as described in thesection above entitled “Noise Removal”. Then, the absolute differencesbetween possible pairs of images in the encounter are computed 74816 andthe median across the absolute difference images is evaluated 74818.Pixels at a given spatial position in the images of an encounter aredeclared to be outside the fundus if the median of the absolutedifference images 74818 at that position is low (for example, close tozero), 74820, and 74824 if the median of those pixel values is alsosmall 74822. The fundus mask 74828 is obtained by logically negating74826 the mask indicating regions outside the fundus.

In particular, for retinal images, prior techniques to determine fundusmasks include processing one retinal image at a time, which are based onthresholding the pixel intensities in the retinal image. Although theseimage-level fundus mask generation algorithms may be accurate for someretinal fundus photographs, they could fail for photographs that havedark fundus regions, such as those shown in FIG. 8. The failure of theimage-level fundus mask generation algorithm as in FIG. 8E and FIG. 8His primarily due to the pixel intensity thresholding operation thatdiscards dark regions that have low pixel intensities in the imagesshown in FIG. 8A and FIG. 8D.

The drawbacks of image-level fundus mask generation can be overcome bycomputing a fundus mask using multiple images in an encounter, that is agiven visit of a given patient. For example, three or more images in anencounter may be used if the images in the encounter have been capturedusing the same imaging hardware and settings and hence have the samefundus mask. Therefore, the encounter-level fundus mask computed usingdata from multiple images in an encounter will be more robust for lowpixel intensities in the regions with useful image information.

Embodiments of encounter-level fundus masks generated using multipleimages within an encounter are shown in FIGS. 8I, 8J, 8K, and 8L. It canbe noted that in FIGS. 8A and 8D, pixels with low intensity values thatare within the fundus regions are correctly identified by theencounter-level fundus mask shown in FIGS. 8I and 8L, unlike in theimage-level fundus masks shown in FIGS. 8E and 8H.

In one embodiment, the fundus mask generation algorithm validates thatthe images in an encounter share the same fundus mask by computing theimage-level fundus masks and ensuring that the two masks obtained differin less than, for example, 10% of the total number of pixels in eachimage by logically “AND”-ing and “OR”-ing the individual image-levelfundus masks. If the assumption is not validated, the image-level fundusmasks are used and the encounter-level fundus masks are not calculated.Median values of absolute differences that are close to zero can beidentified by hysteresis thresholding, for example by using techniquesdisclosed in John Canny, “A Computational Approach to Edge Detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 6(1986): 679-698. In one embodiment, the upper threshold is set to −2,and the lower threshold is set to −3, such that medians of the pixelvalues are determined to be small if they are less than 15, the samevalue used for thresholding pixel values during image-level fundus maskgeneration.

2. Lens or Sensor Dust and Blemish Artifact Detection

Dust and blemishes in the lens or sensor of an imaging device manifestas artifacts in the images captured using that device. In medicalimages, these dust and blemish artifacts can be mistaken to bepathological manifestations. In particular, in retinal images, the dustand blemish artifacts can be mistaken for lesions by both human readersand image analysis algorithms. However, detecting these artifacts usingindividual images is difficult since the artifacts might beindistinguishable from other structures in the image. Moreover, sinceimages in an encounter are often captured using the same imaging deviceand settings, the blemish artifacts in these images will be co-locatedand similar looking. Therefore, it can be beneficial to detect the dustand blemish artifacts using multiple images within an encounter. Imageartifacts due to dust and blemishes on the lens or in the sensor aretermed as lens dust artifacts for simplicity and brevity, since they canbe detected using similar techniques within the framework describedbelow.

FIG. 9A depicts one embodiment of a process for lens dust artifactdetection. The blocks for lens dust artifact detection may beimplemented on the cloud 19014, or a computer or computing device 19004,a mobile device 19008, or the like, as shown in FIG. 1. The individualimages are first processed 92300 to detect structures that couldpossibly be lens dust artifacts. Detected structures that are co-locatedacross many of the images in the encounter are retained 92304, while theothers are discarded. The images in the encounter are also independentlysmoothed 92302 and processed to determine pixel positions that aresimilar across many images in the encounter 92306. The lens dust mask92310 indicates similar pixels that also correspond to co-locatedstructures 92308 as possible locations for lens dust artifacts.

Additional information about embodiments of each of these blocks of thelens dust detection algorithm is discussed below. In one embodiment,lens dust detection is disabled if there are fewer than three images inthe encounter, since in such a case, the lens dust artifacts detectedmay not be reliable. Moreover, the lens dust detection uses the red andblue channels of the photographs since vessels and other retinalstructures are most visible in the green channel and can accidentallyalign in small regions and be misconstrued as lens dust artifacts. Thelens dust artifacts are detected using multiple images in the encounteras described below and indicated by a binary lens dust mask which hastrue values at pixels most likely due to lens dust artifacts.

In one embodiment, noise may be removed from the images in the encounterusing the algorithm described in the section above entitled “NoiseRemoval”. These denoised images are denoted as I⁽¹⁾, I⁽²⁾, . . . I^((N))where N is the total number of images in the encounter and theindividual channels of the denoised images are denoted as I^((i),c)where c=r and b indicates which of the red or blue channels is beingconsidered. If N≥3 and the image-level fundus masks are consistent, forexample as determined by performing encounter-level fundus maskgeneration, the input images comprising the red and blue channels areindividually normalized and/or enhanced using the processes described inthe section above entitled “Image Enhancement.” As shown in FIG. 9B, foreach channel of each input image I^((i),c), two enhanced images aregenerated using different radii for the median filter: I_(h) ^((i),c)with radius h 92312 and I_(l) ^((i),c) with radius l 92314. Thedifference between the two enhanced images I_(diff) ^((i),c)=(I_(h)^((i),c)−I_(l) ^((i),c)) is calculated 92316 and hysteresis thresholdedusing different thresholds 92318 and 92320 to detect dark and brightstructures from which large and elongated structures are removed 92322and 92324. The masks indicating the bright and dark structures arelogically “OR”-ed 92326 to obtain the mask M_(bright,dark) ^((i),c)92328.

As shown in FIG. 9C, the mask M_(bright,dark) ^((i),r) for the redchannel and the mask M_(bright,dark) ^((i),b) for the blue channel arefurther logically “OR”-ed to get a single mask M_(bright,dark) ^((i))92334 showing locations of bright and dark structures that are likely tobe lens dust artifacts in the image I^((i)). If a spatial location isindicated as being part of a bright or dark structure in more than 50%of the images in the encounter 92336, it is likely that a lens dustartifact is present at that pixel location. This is indicated in abinary mask M_(colocated struct) 92338.

The normalized images I_(l) ^((i),c), i=1, 2, . . . N, c=r, b areprocessed using a Gaussian blurring filter 92330 to obtain smoothedversions I_(h,smooth) ^((i),c) 92332 as shown in FIG. 9B. Then as shownin FIG. 9D, pair-wise absolute differences 92342 of these smoothed,normalized images are generated. In one embodiment, the difference 92348between the 80th percentile a high percentile (for example, 92344) and20th percentile a lower percentile (for example, 92346) of theseabsolute differences is computed as I_(diff range) ^(c) and hysteresisthresholded 92350 to obtain a mask M_(similarity) ^(c), c=r, b 92352that indicates the spatial image locations where the images are similarwithin each of the red and blue channels.

Finally as illustrated in FIG. 9E, the lens dust mask 92310 for theimages in the encounter is obtained by logically “AND”-ing 92356 themask M_(colocated struct) 92338 indicating co-located structures and thelogically “OR”-ed 92354 per-channel similarity masks M_(similarity)^(c), c=r, b 92352.

FIG. 10 shows embodiments of retinal images from encounters with lensdust artifacts shown in the insets. Lens dust artifacts in images 1through 4 of three different encounters are indicated by the blackarrows within the magnified insets. The lens dust masks obtained for thethree encounters using the above described process are shown in FIGS.10E, 10J, and 10O. Encounter A (FIGS. 10A, 10B, 10C, and 10D) has apersistent bright light reflection artifact that is captured in the lensdust mask in FIG. 10E. The lens dust mask also incorrectly marks somesmooth regions without lesions and vessels along the edge of the fundusthat are similar across the images (top-right corner of FIG. 10E).However, such errors do not affect retinal image analysis since theregions marked do not have lesions nor vessels of interest. Encounter B(FIGS. 10F, 10G, 10H, and 10I) has a large, dark lens dust artifact thatis captured in the lens dust mask in FIG. 10J. Encounter C (FIGS. 10K,10L, 10M, and 10N) has a tiny, faint, microaneurysm-like lens dustartifact that is persistent across multiple images in the encounter. Itis detected by the process and indicated in the lens dust mask in FIG.10O.

In one embodiment, median filter radii of h=100 pixels and 1=5 pixelsare used to normalize the images. The hysteresis thresholding of themedian normalized difference I_(diff) ^((i),c) to obtain the bright maskis performed using an upper threshold that is the maximum of 50 and the99th percentile of the difference values and a lower threshold that isthe maximum of 40 and the 97th percentile of the difference values. Thedark mask is obtained by hysteresis thresholding −I_(diff) ^((i),c) (thenegative of the median normalized difference) with an upper threshold;for example, the minimum of 60 and the 99th percentile of −I_(diff)^((i),c) and a lower threshold that is the minimum of 50 and the 97thpercentile of −I_(diff) ^((i),c). In one embodiment, groups of pixelswith eccentricity less than 0.97 and with more than 6400 pixels arediscarded. The smoothed normalized image I_(h,smooth) ^((i),c) isobtained using a Gaussian smoothing filter with σ=2. To obtain thesimilarity mask as shown in FIG. 9D, −I_(diff range) ^(c) (the negativedifference of the 80th and 20th percentile of the pair-wise absolutedifferences of I_(h,smooth) ^((i),c)) is hysteresis thresholded with anupper threshold that is the maximum of −5 and 95th percentile of−I_(diff range) ^(c) and a lower threshold that is the minimum of −12and 90th percentile of −I_(diff range) ^(c). However, it is recognizedthat other values may be used to implement the processor.

D. Interest Region Detection

Typically, a large percentage of a retinal image comprises of backgroundretina pixels which do not contain any interesting pathological oranatomic structures. Identifying interesting pixels for futureprocessing can provide significant improvement in processing time, andin reducing false positives. To extract interesting pixels for a givenquery, multi-scale morphological filterbank analysis is used. Thisanalysis allows the systems and methods to be used to construct interestregion detectors specific to lesions of interest. Accordingly, a queryor request can be submitted which has parameters specific to aparticular concern. As one example, the query may request the system toreturn “bright blobs larger than 64 pixels in area but smaller than 400pixels”, or “red elongated structures that are larger than 900 pixels”.A blob includes a group of pixels with common local image properties.

FIG. 11 depicts one embodiment of a block diagram for evaluatinginterest region pixels at a given scale. The illustrated blocks may beimplemented either on the cloud 19014, a computer or computing device19004, a mobile device 19008 or the like, as shown in FIG. 1. Scaledimage 1200 is generated by resizing image 100 to a particular value.“Red/Bright?” 1202 indicates whether the lesion of interest is red orbright. Maximum lesion size 1204 indicates the maximum area (in pixels)of the lesion of interest. Minimum lesion size 1206 indicates theminimum area (in pixels) of the lesion of interest. Median normalized(radius r) 1208 is output of image enhancement block 106 when thebackground estimation is performed using a disk of radius r. Mediannormalized difference 1210 is the difference between two mediannormalized images 1208 obtained with different values of radius r.Determinant of Hessian 1212 is a map with the determinant of the Hessianmatrix at each pixel. Local peaks in determinant of Hessian 1214 is abinary image with local peaks in determinant of Hessian marked out.Color mask 1216 is a binary image with pixels in the median normalizeddifference image 1210 over or below a certain threshold marked.Hysteresis threshold mask 1218 is a binary image obtained afterhysteresis thresholding of input image. Masked color image 1220 is animage with just the pixels marked by color mask 1216 set to values asper median normalized difference image 1210. The pixel locationsindicated by the local peaks in determinant of Hessian 1214 can be setto the maximum value in the median normalized difference image 1210incremented by one. Final masked image 1222 is an image obtained byapplying the hysteresis threshold mask 1218 to masked color image 1220.Interest region at a given scale 1224 is a binary mask marking interestregions for further analysis.

Retinal fundus image I^(s) ⁰ is scaled down by factor f, n times andscaled images I^(s) ⁰ , I^(s) ¹ . . . I^(s) ^(n) are obtained. In oneembodiment, the ratio between different scales is set to 0.8 and 15scales are used. At each scale s_(k), the median normalized imagesI_(Norm,r) _(h) ^(s) ^(k) and I_(Norm,r) _(l) ^(s) ^(k) are computedwith radius r_(h) and r_(l), r_(h)>r_(l) as defined by Equation 1 where

defined as a circle of radius r. In one embodiment values of r_(h)=7 andr_(l)=3 can be used. Then, the difference image I_(diff) ^(s) ^(k)=I_(Norm,r) _(h) ^(s) ^(k) −I_(Norm,r) _(l) ^(s) ^(k) is convolved witha Gaussian kernel, and gradients L_(xx) (x,y), L_(xy)(x,y), L_(xy)(x,y)and L_(yy)(x,y) are computed on this image. The Hessian H is computed ateach pixel location (x,y) of the difference as:

$\begin{matrix}{{H( {x,y} )} = \begin{bmatrix}{L_{xx}( {x,y} )} & {L_{xy}( {x,y} )} \\{L_{xy}( {x,y} )} & {L_{yy}( {x,y} )}\end{bmatrix}} & {{Equation}\mspace{14mu} 2}\end{matrix}$

where L_(aa)(x,y) is second partial derivative in the a direction andL_(ab)(x,y) is the mixed partial second derivative in the a and bdirections. Determinant of Hessian map L_(|H|) of the difference imageI_(diff) ^(s) ^(k) is the map of the determinant of H at each pixel. Inone embodiment, given a query for red or bright lesion of minimum sizemin_(s) ₀ and maximum size max_(s) ₀ which are scaled to min_(s) _(k)and max_(s) _(k) respectively for scale s_(k), the following operationsare performed, as depicted in FIG. 11:

-   1. Mask M that marks red pixels in the scaled image I^(s) ^(k) is    generated as follows.

${M( {x,y} )} = \{ \begin{matrix}1 & {{{if}\mspace{14mu} {I_{diff}^{s_{k}}( {x,y} )}} < 0} \\0 & {{otherwise}\mspace{65mu}}\end{matrix} $

Mask image M if bright pixels are to be marked.

${M( {x,y} )} = \{ \begin{matrix}1 & {{{if}\mspace{14mu} {I_{diff}^{s_{k}}( {x,y} )}} > 0} \\0 & {{otherwise}\mspace{65mu}}\end{matrix} $

-   2. Image with just red (or bright) pixels I_(col) ^(s) ^(k) is    generated by using mask M.

I _(col) ^(s) ^(k) (x,y)=I _(diff) ^(s) ^(k) (x,y)M(x,y)

-   3. Mask P_(doh) containing the local peaks in determinant of Hessian    L_(|H|) is generated.-   4. The maximum value i_(max) ^(s) ^(k) in I_(col) ^(s) ^(k) is    found, and the pixels marked by mask P_(doh) are set to I_(max) ^(s)    ^(k) +1.

i _(max) ^(s) ^(k) =max(I _(col) ^(s) ^(k) )

I _(col) ^(s) ^(k) (x,y|P _(doh)(x,y)=1)=i _(max) ^(s) ^(k) +1

-   5. The resultant image I_(col) ^(s) ^(k) is hysteresis thresholded    with the high threshold t^(hi) and low threshold t^(lo) to obtain    mask G_(col) ^(s) ^(k) . In one embodiment, t^(hi) is set to the    larger of 97 percentile of I_(col) ^(s) ^(k) or 3, and t^(lo) is set    to the larger of 92 percentile of I_(col) ^(s) ^(k) or 2.-   6. The resulting mask G_(col) ^(s) ^(k) is applied on determinant of    Hessian map L_(|H|) to obtain L_(|H|,col).

L _(|H|,col)(x,y)=L _(|H|)(x,y)G _(col) ^(s) ^(k) (x,y)

-   7. Mask P_(doh,col) containing the local peaks in determinant of    Hessian L_(|H|,col) is generated.-   8. Pixels in I_(col) ^(s) ^(k) marked by mask P_(doh,col) are set to    i_(max) ^(s) ^(k) +1.

I _(col) ^(s) ^(k) (x,y)|P _(doh,col)(x,y)=1)=i _(max) ^(s) ^(k) +1

-   9. I_(col) ^(s) ^(k) is then masked with G_(col) ^(s) ^(k) .

I _(col,masked) ^(s) ^(k) (x,y)=I _(col) ^(s) ^(k) (x,y)G _(col) ^(s)^(k) (x,y)

-   10. The resultant image I_(col,masked) ^(s) ^(k) is hysteresis    thresholded with the high threshold t^(hi) and low threshold t^(lo)    to obtain mask F_(col) ^(s) ^(k) . In one embodiment, t^(hi) is set    to the larger of i_(max) ^(s) ^(k) or 3, and t^(lo) is set to the    larger of 92 percentile of I_(col) ^(s) ^(k) or 2.-   11. Locations with area larger than max_(s) _(k) are removed from    this mask F_(col) ^(s) ^(k) . Similarly, locations with area smaller    than min_(s) _(k) are also removed. The locations indicated by the    resulting pruned mask Z_(col) ^(s) ^(k) are interesting regions    scaled by s_(k).

In another embodiment, F_(col) ^(s) ^(k) is obtained after hysteresisthresholding I_(col) ^(s) ^(k) in (3) above with the high thresholdt^(hi) and low threshold t^(lo). This approach may lead to a largernumber of interesting points being picked.

In another embodiment, the maximum number of interesting areas (orblobs) that are detected for each scale can be restricted. This approachmay lead to better screening performance. Blobs can be ranked based onthe determinant of Hessian score. Only the top M blobs per scale basedon this determinant of Hessian based ranking are preserved in theinterest region mask. Alternatively, a blob contrast number can be usedto rank the blobs, where the contrast number is generated by computingmean, maximum, or median of intensity of each pixel within the blob, orby using a contrast measure including but not limited to Michelsoncontrast. The top M blobs per scale based on this contrast ranking arepreserved in the interest region mask. Alternatively, at each scale, theunion of the top M blobs based on contrast ranking and the top N blobsbased on determinant of Hessian based ranking can be used to generatethe interest region mask. Blobs that were elongated potentially belongto vessels and can be explicitly excluded from this mask. Blobs might beapproximately circular or elongated. Approximately circular blobs mayoften represent lesions. Elongated blobs represent vasculature. The topblobs are retained at each scale and this is used to generate theP_(doh,col) mask. The resultant P_(doh,col) is then used to pick thedetected pixels. Another variation used for P_(doh,col) mask generationwas logical OR of the mask obtained with top ranked blobs based on thedoh score and the contrast score. Blot hemorrhages can be included byapplying a minimum filter at each scale to obtain G_(col) ^(s) ^(k)rather than using the median normalized difference image.

The pixels in the pruned mask Z_(col) ^(s) ^(k) at each of the scales_(k) are rescaled to scale s₀ and the result is a set of pixels markedfor further lesion analysis. This leads to natural sampling of largelesion blobs, choosing a subset of pixels in large blobs, rather thanusing all the pixels. In one embodiment, on average, retinal fundusimages with over 5 million pixels can be reduced to about 25,000“interesting” pixels leading to elimination of 99.5% of the totalpixels. FIG. 12B shows the detected interest regions for an exampleretinal image of FIG. 12A.

As part of the automated detection, in one embodiment, the system may beconfigured to process the retinal image and during such processingprogressively scale up or down the retinal image using a fixed scalingfactor; designate groups of neighboring pixels within a retinal image asactive areas; and include the active areas from each scale as interestregions across multiple scales.

E. Local Region Descriptors

The pixels or the local image regions flagged as interesting by themethod described above in the section entitled “Interest RegionDetection,” can be described using a number or a vector of numbers thatform the local region “descriptor”. In one embodiment, these descriptorsare generated by computing two morphologically filtered images with themorphological filter computed over geometric-shaped local regions (suchas a structuring element as typically used in morphological analysis) oftwo different shapes or sizes and taking the difference between thesetwo morphological filtered images. This embodiment produces one number(scalar) describing the information in each pixel. By computing suchscalar descriptors using morphological filter structural elements atdifferent orientations and/or image scales, and stacking them into avector, oriented morphological descriptors and/or multi-scalemorphological descriptors can be obtained. In one embodiment, a medianfilter is used as the morphological filter to obtain oriented mediandescriptors, and multi-scale median descriptors. In another embodiment,multiple additional types of local descriptors can be computed alongsidethe median and/or oriented median descriptors.

As part of the automated generation of descriptors, in one embodiment,the first geometric shape is either a circle or a regular polygon andthe second geometric shape is an elongated structure with a specifiedaspect ratio and orientation, and the system is configured to generate avector of numbers, the generation comprising: varying an orientationangle of the elongated structure and obtaining a number each for eachorientation angle; and stacking the obtained numbers into a vector ofnumbers.

In another embodiment, the number or the vectors of numbers can becomputed on a multitude of images obtained by progressively scaling upand/or down the original input image with a fixed scaling factorreferred to as multi-scale analysis, and stacking the obtained vector ofnumbers into a single larger vector of numbers referred to asmulti-scale descriptors.

These local region descriptors can be tailored to suit specific imageprocessing and analysis applications such as, for example:

-   -   i. describing landmark points for automated image registration        (as described in the section below entitled “Detection And        Description Of Landmark Points”),    -   ii. evaluating the quality of images (as described in the        section below entitled “Descriptors That Can Be Used For Quality        Assessment”),    -   iii. lesion localization (as described in the section below        entitled “Processing That Can Be Used To Locate The Lesions”).

IV. Automated Image Registration A. General Description

This section describes embodiments directed to image-to-imageregistration. Image-to-image registration includes automated alignmentof various structures of an image with another image of the same objectpossibly taken at a different time or different angle, different zoom,or a different field of imaging, where different regions are imaged witha small overlap. When applied to retinal images, registration caninclude identification of different structures in the retinal imagesthat can be used as landmarks. It is desirable that these structures areconsistently identified in the longitudinal images for the registrationto be reliable. The input retinal images (Source image I_(source),Destination image I_(dest)) can be split into two parts:

-   -   constant regions in which structures are constant, for example,        vessels ONH, and    -   variable regions in which structures are changing, for example,        lesions.

Landmarks are detected at the constant regions and are matched usingdifferent features. These matches are then used to evaluate theregistration model. FIG. 13A shows an overview of the operationsinvolved in registering two images in one embodiment. The keypointdescriptor computation block 300 computes the descriptors used formatching image locations from different images. One embodiment of thekeypoint descriptor computation block is presented in FIG. 13B. Theblocks shown in FIGS. 13A and 13B here can be implemented on the cloud19014, a computer or computing device 19004, a mobile device 19008, orthe like as shown in FIG. 1. The matching block 302 matches imagelocations from different images. The RANdom Sample And Consensus(RANSAC) based model fitting block 304 estimates image transformationsbased on the matches computed by the matching block 302. The warpingblock 306 warps the image based on the estimated image transformationmodel evaluated by RANSAC based model fitting block 304. Source image308 is the image to be transformed. Destination image 314 is thereference image to whose coordinates the source image 308 is to bewarped using the warping block 306. Source image registered todestination image 312 is the source image 308 warped into thedestination image 314 coordinates using the warping block 306.

B. Registration

1. Detection and Description of Landmark Points

FIG. 13B provides an overview of descriptor computation for oneembodiment of the image registration module. The image 100 can refer tothe retinal data, single or multidimensional, that has been capturedusing a retinal imaging device, such as cameras for color image capture,fluorescein angiography (FA), adaptive optics, optical coherencetomography (OCT), hyperspectral imaging, scanning laser ophthalmoscope(SLO), wide-field imaging or ultra-wide-field imaging. Fundus maskgeneration block 102 can provide an estimation of a mask to extractrelevant image sections for further analysis. Image gradabilitycomputation module 104 can enable computation of a score thatautomatically quantifies the gradability or quality of the image 100 interms of analysis and interpretation by a human or a computer. Imageenhancement module 106 can enhance the image 100 to normalize theeffects of lighting, different cameras, retinal pigmentation, or thelike. Vessel extraction block 400 can be used to extract the retinalvessels from the fundus image 100. Keypoint detection block 402 canevaluate image locations used for matching by matching block 302.Descriptor computation block 404 can evaluate descriptors at keypointlocations to be used for matching by matching block 302.

Branching of vessels can be used as reliable landmark points orkeypoints for registration. By examining for blobs across multiplescales at locations with high vesselness, locations that are promisingkeypoints for registration can be extracted. In one embodiment,vesselness map is hysteresis thresholded with the high and lowthresholds set at 90 and 85 percentiles respectively for the givenimage. These thresholds may be chosen based on percentage of pixels thatare found to be vessel pixels on an average. The resulting binary mapafter removing objects with areas smaller than a predefined threshold,chosen, for example, based on the smallest section of vessels that areto be preserved, V_(thresh), is used as a mask for potential keypointlocations. For example, 1000 pixels are used as the threshold in oneembodiment, a value chosen based on the smallest section of vessels tobe preserved.

In one embodiment, the fundus image can be smoothed with Gaussianfilters of varying sigma, or standard deviation. In one implementation,the range of sigmas, or standard deviations, can be chosen based onvessel widths. For example, sigmas (σ) of 10, 13, 20 and 35 pixels canbe used to locate vessel branches at different scales. Scale normalizeddeterminant of Hessian can be computed at pixel locations labeled byV_(thresh) at each of these scales. In one embodiment, local peaks inthe determinant of Hessian map, evaluated with the minimum distancebetween the peaks, for example, D=1+(σ−0.8)/0.3, are chosen as keypointsfor matching.

The local image features used as descriptors in some embodiments arelisted below. Some descriptors are from a patch of N×N points centeredat the keypoint location. In one embodiment, N is 41 and the points aresampled with a spacing of σ/10. Local image features used as descriptorsfor matching in one embodiment can include one or more of the following:

-   -   Vector of normalized intensity values (from the green channel);    -   Vector of normalized vesselness values;    -   Histogram of vessel radius values from the defined patch at        locations with high vesselness, for example, greater than 90        percentile vesselness over the image. (Using locations with high        vesselness can ensure that locations with erroneous radius        estimates are not used.)    -   Oriented median descriptors (OMD): Vector of difference in        responses between an oriented median filter and median filtered        image        These descriptors can provide reliable matches across        longitudinal images with varying average intensities.

In one embodiment, the keypoints in the source and destination imagesare matched using the above defined descriptors. FIG. 14 shows matchedkeypoints from the source and destination images. In one embodiment,Euclidean distance is used to measure similarity of keypoints. In oneembodiment, brute-force matching is used get the best or nearly bestmatches. In one embodiment, matches that are significantly better thanthe second best or nearly best match are preserved. The ratio of thedistance between the best possible match and the second best or nearlybest possible match is set to greater than 0.9 for these preservedmatches. In one embodiment, the matches are then sorted based on thecomputed distance. The top M matches can be used for model parametersearch using, for example, the RANSAC algorithm. In one embodiment, Mcan be 120 matches.

2. Model Estimation Using RANSAC

Some embodiments pertain to the estimation of the model for image toimage registration. The RANSAC method can be used to estimate a model inthe presence of outliers. This method is helpful even in situationswhere many data points are outliers, which might be the case for somekeypoint matching methods used for registration. Some embodimentsdisclose a framework for model estimation for medical imaging. However,the disclosed embodiments are not limited thereto and can be used inother imaging applications.

The RANSAC method can include the following actions performediteratively (hypothesize-and-test framework).

-   1. Hypothesize: Randomly select minimal sample sets (MSS) from the    input dataset (the size of the MSS, k, can be the smallest number of    data points sufficient to estimate the model). Compute the model    parameters using the MSS.-   2. Test: For the computed model, classify the other data points    (outside the MSS) into inliers and outliers. Inliers can be data    points within a distance threshold t of the model. The set of    inliers constitutes the consensus set (CS).

These two actions can be performed iteratively until the probability offinding a better CS drops below a threshold. The model that gives thelargest cardinality for the CS can be taken to be the solution. Themodel can be re-estimated using the points of the CS. The RANSAC methodused can perform one or more of the following optimizations to helpimprove the accuracy of estimation, and efficiency of computation, interms of number of the iterations.

-   -   Instead of using a fixed threshold for the probability of        finding a better CS, the threshold is updated after each        iteration of the algorithm.    -   For two iterations producing CS of the same size, the CS with        lower residue or estimation error (as defined by the algorithm        used for model estimation), is retained.

The random selection of points for building the MSS could result indegenerate cases from which the model cannot be reliably estimated. Forexample, homography computation might use four Cartesian points (k=4),but if three of the four points are collinear, then the model may not bereliably estimated. These degenerate samples can be discarded. Checksperformed during image registration to validate the MSS can prevent orminimize the occurrence of three or more of collinear chosen points, aswell as allowing the three points to be at a certain distance from eachother to obtain good spatial distribution over the image.

3. Image Registration Models

Other processes for obtaining retinal image registration can be used.Customizations usable with the RANSAC method in order to compute themodels are also provided.

A point on an image can be denoted as a 2D vector of pixel coordinates[x y]^(T)∈

². It can also be represented using homogeneous coordinates as a 3Dvector [wx wy w]^(T) in projective space where all vectors that differonly by a scale are considered equivalent. Hence the projective spacecan be represented as

²=

³−[0 0 0]. The augmented vector [x y 1]^(T) can be derived by dividingthe vector components of the homogeneous vector by the last element w.The registration models can be discussed using this coordinate notation,with [x y 1]^(T), the point in the original image, and [x′ y′ 1]^(T),the point in the “registered” image.

The rotation-scaling-translation (RST) model can handle scaling by afactor s, rotation by an angle φ, and translation by [t_(x) t_(y)]^(T).In one embodiment, the transformation process can be expressed as:

$\begin{matrix}{\begin{bmatrix}x^{\prime} \\y^{\prime} \\1\end{bmatrix} = {{\underset{\underset{T_{\theta}}{}}{\begin{bmatrix}{s\; \cos \; \phi} & {{- s}\; \sin \; \phi} & t_{x} \\{s\; \sin \; \phi} & {s\; \cos \; \phi} & t_{y} \\0 & 0 & 1\end{bmatrix}}\begin{bmatrix}x \\y \\1\end{bmatrix}}.}} & {{Equation}\mspace{14mu} 3}\end{matrix}$

This model, denoted by T_(θ), can be referred to as a similaritytransformation since it can preserve the shape or form of the object inthe image. The parameter vector θ=[s cos φ s sin φ t_(x) t_(y)]^(T) canhave 4 degrees of freedom: one for rotation, one for scaling, and twofor translation. The parameters can be estimated in a least squaressense after reordering Equation 3 as:

$\underset{\underset{b}{}}{\begin{bmatrix}x^{\prime} \\y^{\prime}\end{bmatrix}} = {\underset{\underset{A}{}}{\begin{bmatrix}x & {- y} & 1 & 0 \\y & x & 0 & 1\end{bmatrix}}\theta}$

The above matrix equation has the standard least squares form of Aθ=b,with θ being the parameter vector to be estimated. Each keypointcorrespondence contributes two equations, and since total number ofparameters is four, at least two such point correspondences can be usedto estimate θ. In this example, the cardinality of MSS is k=2. Theequations for the two-point correspondences are stacked over each otherin the above form Aθ=b, with A being a matrix of size 4×4, and b beingvector of size 4×1. In this example, at each hypothesize operation ofRANSAC, two-point correspondences are randomly chosen, and theparameters are estimated. The error between the ith pair of pointcorrespondences x_(i) and x′_(i) for the computed model T_(θ) can bedefined as:

$\begin{matrix}{e_{i}^{2}\overset{def}{=}{\underset{\underset{{reprojection}\mspace{14mu} {error}}{}}{( {{x_{i}^{\prime} - {T_{\theta}( x_{i} )}}}^{2} )} + {{x_{i} - {T_{\theta}^{- 1}( x_{i}^{\prime} )}}}^{2}}} & {{Equation}\mspace{14mu} 4}\end{matrix}$

The first term in the above equation can be called the reprojectionerror and e_(i) as a whole can be referred to as the symmetricreprojection error (SRE). In one embodiment, point correspondences whoseSRE are below a certain threshold can be retained as inliers in the testoperation of RANSAC. The average SRE over the points in the CS can beused as the residue to compare two CS of the same size.

The affine model can handle shear and can be expressed as:

$\begin{bmatrix}x^{\prime} \\y^{\prime} \\1\end{bmatrix} = {{\underset{\underset{T_{\theta}}{}}{\begin{bmatrix}a_{11} & a_{12} & t_{x} \\a_{21} & a_{22} & t_{y} \\0 & 0 & 1\end{bmatrix}}\begin{bmatrix}x \\y \\1\end{bmatrix}}.}$

In one embodiment, the parameter vector for affine model, θ, can be ofsize 6, and can be implemented with three-point correspondences (k=3).In this example, the above equation can be re-written into the standardleast squares form Aθ=b, with A being a matrix of size 6×6, and b beingvector of size 6×1 for the three-point correspondences. As before, θ canthen be estimated using least squares. The selection of points for MSScan be done to avoid the degenerate cases by checking for collinearityof points. The SRE can then be computed (with T being the affine model)and used to validate inliers for CS and compute the residue forcomparison of two CS of the same size.

The homography model can handle changes in view-point (perspective) inaddition to rotation, scaling, translation, and shear and representedas:

$\begin{bmatrix}{x^{\prime} \cdot w^{\prime}} \\{y^{\prime} \cdot w^{\prime}} \\w^{\prime}\end{bmatrix} = {{\underset{\underset{H}{}}{\begin{bmatrix}\theta_{1} & \theta_{4} & \theta_{7} \\\theta_{2} & \theta_{5} & \theta_{8} \\\theta_{3} & \theta_{6} & \theta_{9}\end{bmatrix}}\begin{bmatrix}x \\y \\1\end{bmatrix}}.}$

In this example, even though the homography matrix H is a 3×3 matrix, ithas only 8 degrees of freedom due to the W scaling factor in theleft-hand-side of the above equation. In order to fix the 9th parameter,an additional constraint of ∥θ∥=1 can be imposed, where θ=[θ₁, θ₂, . . ., θ₉]^(T). Estimation of this parameter vector can be performed withfour-point correspondences and done using the normalized direct lineartransform (DLT) method/algorithm, which can produce numerically stableresults. For the MSS selection, one or more of the following actions canbe taken to avoid degenerate cases:

-   -   Checking for collinearity of three or more points by computing        the area of the triangle formed by the three points and checking        if it is less than a predefined threshold, for example, 2        pixel-squared;    -   Choosing distances between the chosen points greater than a        threshold, for example, 32 pixels; or    -   Preserving the order of points after transformation, for        example, using techniques discussed in Pablo Marquez-Neila et        al., “Speeding-up Homography Estimation in Mobile Devices,”        Journal of Real-Time Image Processing (Jan. 9, 2013).        The SRE can be used to form and validate the CS.

The quadratic model can be used to handle higher-order transformationssuch as x-dependent y-shear, and y-dependent x-shear. Since the retinais sometimes modeled as being almost spherical, a quadratic model ismore suited for retinal image registration. In one embodiment, the modelcan be represented as:

${\begin{bmatrix}x^{\prime} \\y^{\prime}\end{bmatrix} = {\underset{\underset{Q}{}}{\begin{bmatrix}\theta_{1} & \theta_{2} & \theta_{3} & \theta_{4} & \theta_{5} & \theta_{6} \\\theta_{7} & \theta_{8} & \theta_{9} & \theta_{10} & \theta_{11} & \theta_{12}\end{bmatrix}}{\Psi ( \begin{bmatrix}x \\y\end{bmatrix} )}}},$

where Ψ([x y]^(T)) is [x² xy y² x y 1]^(T). Unlike RST, affine, orhomography models, the quadratic model may not be invertible. In oneembodiment, the model can have 12 parameters and 6 keypointcorrespondences for estimation, that is, the size of MSS is k=6. Theabove equation can be rewritten in the standard least squares form Aθ=b,where the parameter vector θ=[θ₁, θ₂, . . . , θ₁₂]^(T), A is a matrix ofsize 12×12, and b is a vector of size 12×1 for the six pointcorrespondences. θ can be estimated using least squares.

As with homography, MSS selection may be done to avoid degenerate cases.Since the transform may not be invertible, the reprojection error, thatis, the first term on the right-hand-side of Equation 4, is computed andused to form and validate the CS.

The models discussed above present a set of models that can be used inone or more embodiments of the image registration module. This does notpreclude the use of other models or other parameter values in the samemethods and systems disclosed herein.

4. Registration Model Refinement

In one embodiment, an initial estimate of homography is computed asdescribed in the section above entitled “Model Estimation Using RANSAC”.Using the initial homography estimate, the keypoint locations in thesource image, I_(source) are transformed to the destination image,I_(dest) coordinates. In one embodiment, the keypoint matching operationcan be repeated with an additional constraint that the Euclideandistance between the matched keypoints in the destination imagecoordinates be lesser than the maximum allowable registration errorR_(e). In one embodiment, R_(e) can be fixed at 50 pixels. This processconstrains the picked matches and results and can improve registrationbetween the source and destination images.

Using the refined matches, various registration models can be fittedincluding Rotation-Scale-Translation (RST), Homography and Quadratic. Inone embodiment, for each model, the minimum number of matches may besubtracted from the size of the obtained consensus set. In oneembodiment, the model with the maximum resulting quantity can be chosenas the best model. If two models end up with identical values, then thesimpler model of the two can be chosen as the best model.

5. Image Warping

An aspect of the image registration module may involve warping of theimage to the coordinate system of the base image. FIG. 15 shows examplesof source and destination images that are registered, warped, andoverlaid on each other. In one embodiment, the computed registrationmodels can be used to transform the pixel locations from the originalimage to the registered image. When transformation is applied directly,the integer pixel locations in the input image can map to non-integerpixel locations in the registered image, resulting in “holes” in theregistered image, for example, when the registered image dimensions arelarger than that of the input image. The “holes” can be filled byinterpolating the transformed pixels in the registered image.Alternatively, inverse transform can be used to map registered pixellocations to the input image. For pixels that land at integer locationsafter inverse mapping, the intensity values can be copied from the inputimage, while the intensity values at non-integer pixels in input imagecan be obtained by interpolation.

The above approach can be applied to the invertible registration modelssuch as RST, affine, or homography. If the non-invertible quadraticmodel is used, a forward transform T_(θ) can be used to build a mappingof the integer pixel locations in the input image to the registeredimage. To find the pixel intensity at an integer location in theregistered image, the forward mapping can be checked for any inputlocation maps to the registered location under consideration. If such amapping exists, the intensity value is copied. In the absence of such avalue, the n-connected pixel locations in an m x m neighborhood aroundthe registered pixel can be checked. In one embodiment, n is 8 and m is3. In one embodiment, the closest n pixels in the input image are found,and the pixel intensity at their centroid location is interpolated toobtain the intensity value at the required pixel location. This analysismay be helpful when pixels in a neighborhood in the input image stay inalmost the same relative positions even in the registered image forretinal image registration. In another embodiment, the estimatedquadratic model can be used to compute the forward mapping, swapping theinput and registered pixel locations, and estimating the inverse mapping{circumflex over (T)}_(θ) ⁻¹ using least squares can be used to computethe forward mapping. A mapping can be applied to the integer locationsin the registered image to generate the corresponding mapping from theinput image.

V. Automated Image Assessment

In some embodiments, automated image assessment can be implemented usingone or more features of the automated low-level image processing, and/orimage registration techniques described above; however, using thesetechniques is not mandatory nor necessary in every embodiment ofautomated image assessment.

A. Lens Shot Image Classification

Typically multiple images of the fundus from various fields and botheyes are collected from a patient during a visit. In addition to thecolor fundus images, photographs of the patient's eye's lens may also beadded to the patient encounter images, as illustrated in FIG. 16. In oneembodiment, an automated DR screening system automatically and reliablyseparates these lens shot images from the actual color fundus images.

In one embodiment, lens shot image classification is achieved byprimarily using structural and color descriptors. A given image isresized to a predetermined size. The histogram of orientations (HoG)feature is computed on the green channel to capture the structure of theimage. The vesselness maps for images are computed, using for examplethe processes disclosed in the section below entitled “VesselExtraction”. The vesselness maps are hysteresis thresholded with thelower and higher thresholds set, for example, to 90 and 95 percentilesrespectively to obtain a mask. The color histograms of the pixels withinthe mask are computed. The final descriptor is obtained by appending thecolor histogram descriptors to the HoG descriptors.

The order in which the images were obtained is also sometimes anindicator of an image being a lens shot image. This was encoded as abinary vector indicating absolute value of the difference between theimage index and half the number of images in an encounter.

On a dataset of 10,104 images with over 2000 lens shot images on 50-50train-test splits area under receiver operating characteristics (ROC)curve (AUROC) of over 0.998 were obtained.

B. Image Quality Assessment

1. General Description

In one embodiment, the system may include computer-aided assessment ofthe quality or gradability of an image. Assessment of image gradabilityor image quality can be important to an automated screening system. Thefactors that reduce quality of an image may include, for example, poorfocus, blurred image due to eye or patient movement, large saturatedand/or under-exposed regions, or high noise. In addition, the quality ofimage can be highly subjective. In the context of retinal imageanalysis, “image characteristics that allow for effective screening ofretinopathy by a human grader or software” are preferred, whereas imageswith hazy media are flagged as being of insufficient quality foreffective grading. Quality assessment can allow the clinician todetermine whether he needs to immediately reimage the eye or refer thepatient to a clinician depending on the screening setup employed.

FIG. 17 shows a detailed view of one embodiment of scenarios in whichimage quality assessment can be applied. The patient 179000 is imaged byan operator 179016 using an image capture device 179002. In thisembodiment, the image capture device is depicted as a retinal camera.The images captured are sent to a computer or computing device 179004for image quality analysis. Good quality images 179010 are sent forfurther processing for example on the cloud 179014, a computer orcomputing device 179004, a mobile device 179008, or the like. Poorquality images are rejected, and the operator is asked to retake theimage. In one embodiment a number is computed that reflects the qualityof the image rather than simply classifying the image as of poor qualityor not. In another embodiment, all captured images are sent to the cloud179014, a computer or computing device 179004, a mobile device 179008,or the like, where the quality analysis takes place and the analysisresults are sent back to the operator or the local computer or computingdevice 179004. In another embodiment, the computer itself could directthe image capture device to retake the image. In the second scenario,the patient 179000 takes the image himself using an image capture device179006, which in this case is shown as a retinal camera attachment for amobile device 179008. Quality analysis is done on the mobile device.Poor quality images are discarded, and the image capture device is askedto retake the image. Good quality images 179012 are sent for furtherprocessing.

FIG. 18 gives an overview of one embodiment of a process for performingimage quality computation. The illustrated blocks may be implemented onthe cloud 179014, a computer or computing device 179004, a mobile device179008, or the like, as shown in FIG. 17. The gradability interestregion identification block 602 evaluates an indicator image that istrue or false for each pixel in the original image and indicates ordetermines whether the pixel is interesting or represents an activeregion, so that it should be considered for further processing toestimate gradability of the image. Gradability descriptor setcomputation block 600 is configured to compute a single-dimensional ormulti-dimensional float or integer valued vector that provides adescription of an image region to be used to evaluate gradability of theimage.

In one embodiment, the images are first processed using a Hessian basedinterest region and “vesselness” map detection technique as shown inFIG. 19. The obtained image is then converted to a binary mask byemploying hysteresis thresholding, followed by morphological dilationoperation. The application of this binary mask to the original imagegreatly reduces the number of pixels to be processed by the subsequentblocks of the quality assessment pipeline, without sacrificing theaccuracy of assessment.

Next, image quality descriptors are extracted using the masked pixels inthe image. Table 1 is one embodiment of example descriptors that may beused for retinal image quality assessment.

TABLE 1 Descriptor Name Length How it contributes Local sum-modified 20Captures the degree of local Laplacian focus/blur in an image Localsaturation 20 × 2 Captures the #pixels with descriptor “right” exposureLocal Michelson 20 Captures the local contrast contrast in an image R,G, B color descriptors 20 × 3 Captures the degree of local focus/blur inan image Local entropy descriptors 20 Captures the local texture Localbinary pattern descriptors 20 Captures the local texture Local noisemetric descriptors 20 × 3 Captures the local noise

In one embodiment, using 3-channel (RGB) color retinal fundus images,the green channel is preferred over red or blue channels for retinalanalysis. This is because the red channel predominantly captures thevasculature in the choroidal regions, while the blue channel does notcapture much information about any of the retinal layers. This isillustrated for an example color fundus image, shown in FIG. 20A asgrayscale, with the red channel, shown in FIG. 20B as grayscale, thegreen channel, shown in FIG. 20C as grayscale, and the blue channel,shown in FIG. 20D as grayscale. Hence, in one embodiment, the greenchannel of the fundus image is used for processing. In otherembodiments, all the 3 channels or a subset of them are used forprocessing.

In one embodiment, the system classifies images based on one or more ofthe descriptors discussed below:

2. Descriptors that can be Used for Quality Assessment

a. Focus Measure Descriptors

In one embodiment, for measuring the degree of focus or blur in theimage, the sum-modified Laplacian is used. This has shown to be anextremely effective local measure of the quality of focus in naturalimages, as discussed in S. K. Nayar and Y. Nakagawa, “Shape from Focus,”IEEE Transactions on Pattern Analysis and Machine Intelligence 16, No. 8(1994): 824-831. For the input image I, the sum-modified LaplacianI_(ML) at a pixel location (x,y) can be computed as:

I _(ML)(x,y)=|2I(x,y)−I(x−1,y)−I(x+1,y)|+|2I(x,y)−I(x,y−1)−I(x,y+1)|.

A normalized histogram can be computed over the sum-modified Laplacianvalues in the image to be used as focus measure descriptor. In practice,I_(ML) values that are too low, or too high may be unstable for reliablymeasuring focus in retinal images and can be discarded before thehistogram computation. In one embodiment, the low and high thresholdsare set to 2.5 and 20.5 respectively, which was empirically found togive good results. The computed descriptor has a length of 20. Inpractice, computing the focus descriptors on the image obtained afterenhancement and additional bilateral filtering provides bettergradability assessment results.

b. Saturation Measure Descriptors

In one embodiment, the local saturation measure captures the pixels thathave been correctly exposed in a neighborhood, by ignoring pixels thathave been under-exposed or over-exposed. The correctly exposed pixelsare determined by generating a binary mask M using two empiricallyestimated thresholds, S_(lo) for determining under-exposed pixels andS_(hi) for determining over-exposed pixels. At a pixel location (x,y)the binary mask is determined as:

${M( {x,y} )} = \{ \begin{matrix}1 & {{{{if}\mspace{14mu} S_{lo}} < {I( {x,y} )} < S_{hi}},} \\0 & {{otherwise}.}\end{matrix} $

The local saturation measure at location (x,y) is then determined as:

${{I_{sat}( {x,y} )} = {\sum\limits_{i,{j \in }}{M( {{x - i},{y - j}} )}}},$

where

is a neighborhood of pixels about the location (x,y). In one embodiment,

is a circular patch of radius r pixels. In one embodiment, the followingvalues can be used for an 8-bit image: S_(lo)=40, S_(hi)=240, r=16. Anormalized histogram is then computed over I_(Sat) to generate thesaturation measure descriptors. In one embodiment, the computeddescriptor has a length of 20 for each channel. In addition to thesaturation measure for the green channel, the inclusion of saturationmeasure for the blue channel was empirically found to improve thequality assessment.

c. Contrast Descriptors

In one embodiment, contrast is the difference in luminance and/or colorthat makes an object (or its representation in an image)distinguishable. The contrast measure may include Michelson-contrast,also called visibility, as disclosed in Albert A. Michelson, Studies inOptics (Dover Publications.com, 1995). The local Michelson-contrast at apixel location (x,y) is represented as:

${{I_{MC}( {x,y} )} = \frac{I_{\max} - I_{\min}}{I_{\max} + I_{\min}}},$

where I_(min) and I_(max) are the minimum and maximum pixel intensitiesin a neighborhood

. In one embodiment,

is a circular patch of radius r pixels. A normalized histogram is thencomputed over I_(MC) to obtain the contrast descriptors. In oneembodiment, the computed descriptor has a length of 20.

d. RGB Color Descriptors

In one embodiment, normalized RGB color histograms are computed over thewhole image and used as descriptors of color. In one embodiment, thecomputed descriptor has a length of 20 for each of the R, G, and Bchannels.

e. Texture Descriptors

In one embodiment, descriptors based on local entropy, for example usingtechniques disclosed in Rafael C. Gonzalez and Woods E. Richard,“Digital Image Processing,” Prentice Hall Press, ISBN 0-201-18075-8(2002), are incorporated to characterize the texture of the input image.For an image of bit-depth, B, the normalized histogram p_(i) at pixellocation (x,y), is first computed considering the pixels that lie in aneighborhood

around location (x,y). In one embodiment,

is a circular patch of radius r pixels. Denoting, the local normalizedhistogram as p_(i)(x,y), i=0, 1, . . . , 2^(B)−1, the local entropy isobtained as:

${{I_{Ent}( {x,y} )} = {- {\sum\limits_{i = 0}^{2^{B} - 1}\; {{p_{i}( {x,y} )} \cdot {\log_{2}( {p_{i}( {x,y} )} )}}}}},$

A normalized histogram of the local entropy image I_(Ent) is then usedas a local image texture descriptor. In one embodiment, the computeddescriptor would have a length of 20.

In addition to entropy, in another embodiment, local binary patterns(LBP) based descriptors are also computed to capture the texture in theimage. The LBP can be computed locally for every pixel, and in oneembodiment, the normalized histogram of the LBP image can be used as adescriptor of texture. The computed descriptor would still have a lengthof 20.

f. Noise Metric Descriptor

In one embodiment, since noise also affects the quality of an image, anoise metric descriptor for retinal images is also incorporated using,for example, techniques disclosed in Noriaki Hashimoto et al.,“Referenceless Image Quality Evaluation for Whole Slide Imaging,”Journal of Pathology Informatics 3 (2012): 9. For noise evaluation, anunsharp masking technique may be used. The Gaussian filtered (blurred)retinal image G, is subtracted from the original retinal image, I, toproduce a difference image D with large intensity values for edge ornoise pixels. In one embodiment, to highlight the noise pixels, thecenter pixel in a 3×3 neighborhood

is replaced with the minimum difference between it and the 8 surroundingpixels as:

D_(min)(x, y) = D(i, j) − D(x, y),

where (x,y) is the pixel location in the image. The resulting D_(min)image has high intensity values for noise pixels. In one embodiment, a20-bin normalized histogram of this image can be used as a noise metricdescriptor. The descriptor can be computed for the three channels of theinput retinal image.

3. Image Quality Classification or Regression

In one embodiment, the system includes a classification action for imagequality assessment. In another embodiment, regression analysis isconducted to obtain a number or value representing image quality. One ormore quality descriptors discussed above are extracted and concatenatedto get a single N-dimensional descriptor vector for the image. It isthen subjected to dimensionality reduction, new dimension, M, usingprincipal component analysis (PCA) to consolidate the redundancy amongthe feature vector components, thereby making quality assessment morerobust. The PCA may include techniques disclosed in Hervé Abdi and LynneJ. Williams, “Principal Component Analysis,” Wiley InterdisciplinaryReviews: Computational Statistics 2, No. 4 (2010): 433-459. In oneembodiment the PCA-reduced descriptor then train a support vectorregression (SVR) engine to generate a continuous score to be used forgrading the images, for example, as being of poor, fair, or adequatequality. The SVR may include techniques disclosed in Harris Drucker etal., “Support Vector Regression Machines,” Advances in NeuralInformation Processing Systems (1997): 155-161. In one embodiment, theparameters of the SVR were estimated using a 5-fold cross validation ona dataset of 125 images (73 adequate, 31 fair and 21 poor) labeled forretinopathy gradability by experts. FIG. 21 shows example images ofvarying quality that have been scored by the system. In anotherembodiment a support vector classifier (SVC) is trained to classify poorquality images from fair or adequate quality images. On the 125 imagedataset, the adequate and fair quality images were classified from thepoor quality images with accuracy of 87.5%, with an area under receiveroperating characteristics (AUROC) of 0.90. Further improvements areexpected with the incorporation of texture descriptors. In oneembodiment, the descriptor vector has a length of N=140, which getsreduced to

$M = {\lceil \frac{N}{8} \rceil = 18}$

after PCA. In another embodiment, the entire descriptor vector is used,without the PCA reduction, to train a support vector classifier todistinguish poor quality images from good quality ones. This setupobtained an average accuracy of 87.1%, with an average AUROC of 0.88,over 40 different test-train splits of a retinal dataset of 10,000images.

C. Vasculature Extraction

1. General Description

In one embodiment, the system is configured to identify retinalvasculature, for example, the major arteries and veins in the retina, inretinal images by extracting locations of vasculature in images.Vasculature often remains fairly constant between patient visits and cantherefore be used to identify reliable landmark points for imageregistration. Additionally, vessels in good focus are indicative of goodquality images, and hence these extracted locations may be useful duringimage quality assessment.

2. Identification Of Vessels

a. Vessel Extraction

One embodiment for vesselness computation is provided in FIG. 22. σrefers to the standard deviation of the Gaussian used for smoothing.Gaussian smoothing 1102 convolves the image with a Gaussian filter ofstandard deviation a. This operation is repeated at different values ofa. Hessian computation 1104 computes the Hessian matrix (for example,using Equation 2) at each pixel. Structureness block 1106 computes theFrobenius norm of the Hessian matrix at each pixel. Eigen values 1108 ofthe Hessian matrix are computed at each pixel. Vesselness in σ₁ 1110(Equation 5) is computed at a given pixel after smoothing the image withGaussian smoothing block 1102 of standard deviation σ₁. The maximum 1112at each pixel over multiple values of vesselness is computed atdifferent smoothing. Vesselness 1114 indicates the vesselness of theinput image 100.

In one embodiment, the vessels in the green channel of the color fundusimage can be enhanced after pre-processing using a modified form ofFrangi's vesselness using, for example, techniques disclosed inAlejandro F. Frangi et al., “Multiscale Vessel Enhancement Filtering,”in Medical Image Computing and Computer AssistedInterventation—MICCAI'98 (Springer, 1998), 130-137 (Frangi et al.(1988)). The input image is convolved with Gaussian kernels at a rangeof scales. Gradients L_(xx)(x,y), L_(xy)(x,y), L_(xy)(x,y) andL_(yy)(x,y) are then computed on these images and Hessian H_(s) iscomputed at multiple scales using, for example, Equation 2.

A measure for tubular structures

${R_{T} = \frac{\lambda_{1}}{\lambda_{2}}},$

where λ₁ and λ₂ are the Eigen values of H_(s) and |λ₁|≤λ₂ is computed.Structureness S is evaluated as the Frobenius norm of the Hessian. Thevesselness measure at a particular scale is computed for one embodimentas follows:

$\begin{matrix}{V = \{ \begin{matrix}{0,} & {{{{if}\mspace{14mu} \lambda_{2}} > 0},} \\{e^{- \frac{R_{T}^{2}}{2\beta^{2}}}( {1 - e^{- \frac{S^{2}}{2c^{2}}}} )} & {otherwise}\end{matrix} } & {{Equation}\mspace{14mu} 5}\end{matrix}$

In one embodiment, β is fixed at 0.5 as per Frangi et al. (1998), and cis fixed as the 95 percentile of the structureness S. The vesselnessmeasure across multiple scales is integrated by evaluating the maximumacross all the scales. Vesselness over multiple standardized datasetswere evaluated using, for example, DRIVE, as disclosed in Joes Staal etal., “Ridge-Based Vessel Segmentation in Color Images of the Retina,”IEEE Transactions on Medical Imaging 23, No. 4 (April 2004): 501-509,and STARE, as disclosed in A. Hoover, V. Kouznetsova, and M. Goldbaum,“Locating Blood Vessels in Retinal Images by Piecewise Threshold Probingof a Matched Filter Response,” IEEE Transactions on Medical Imaging 19,No. 3 (2000): 203-210. The combination of the custom image enhancementand modified Frangi vesselness computation can result in performancethat is close to the state of the art. In one embodiment, theunsupervised, non-optimized implementation takes less than 10 s on a605×700 pixel image. Some example vessel segmentations are shown in FIG.19. The receiver operating characteristics (ROC) curve of one embodimenton the STARE dataset is shown in FIG. 23. Table 2 compares the AUROC andaccuracy of one embodiment of the system on the DRIVE and STARE datasetswith human segmentation. This embodiment has better accuracy withrespect to gold standard when compared to secondary human segmentation.

TABLE 2 Accuracy (%) EyeTrace Human AUROC DRIVE 95.3% 94.7% 0.932 STARE95.6% 93.5% 0.914

In one embodiment, the vesselness map is then processed by a filterbankof oriented median filters. In one embodiment, the dimensions of themedian filters are fixed based on the characteristics of the vessels tobe preserved, for example, Height=3 pixels, Length=30 pixels, or 8orientations. At each pixel, the difference between the maximum andmedian filter response across orientations was evaluated. This providesa vasculature estimate that is robust to identify the presence of bloblesions or occlusions. FIG. 24 shows an example vessel extraction usingthe custom morphological filterbank analysis on a poor quality image.

b. Vessel Tracing

In one embodiment, level-set methods such as fast marching are employedfor segmenting the vessels and for tracing them. For example, fastmarching can be used with techniques disclosed in James A. Sethian, “AFast Marching Level Set Method for Monotonically Advancing Fronts,”Proceedings of the National Academy of Sciences 93, No. 4 (1996):1591-1595. The vessel tracing block may focus on utilizing customizedvelocity functions, based on median filterbank analysis, for thelevel-sets framework. At each pixel location the velocity function isdefined by the maximum median filter response. This embodiment leads toan efficient, mathematically sound vessel tracing approach. In oneembodiment, automatic initialization of start and end points for tracingthe vessels in the image is performed using automated optic nerve head(ONH) identification within a framework that provides a lesionlocalization system.

D. Lesion Localization

1. General Description

In one embodiment, the system is configured to localize lesions inretinal images. The lesions may represent abnormalities that aremanifestations of diseases, including diabetic retinopathy, maculardegeneration, hypertensive retinopathy, and so forth. FIG. 25 depictsone embodiment of a lesion localization process. The illustrated blocksmay be implemented on the cloud 19014, a computer or computing system19004, a mobile device 19008, or the like, as shown in FIG. 1. The image100 refers in general to the retinal data, single or multidimensional,that has been captured using a retinal imaging device, such as camerasfor color image capture, fluorescein angiography (FA), adaptive optics,optical coherence tomography (OCT), hyperspectral imaging, scanninglaser ophthalmoscope (SLO), wide-field imaging or ultra-wide-fieldimaging. Fundus mask generation block 102 estimates the mask to extractrelevant image sections for further analysis. Image gradabilitycomputation module 104 computes a score that automatically quantifiesthe gradability or quality of the image 100 in terms of analysis andinterpretation by a human or a computer. Image enhancement module 106enhances the image 100 to normalize the effects of lighting, differentcameras, retinal pigmentation, or the like. Interest regionidentification block 108 generates an indicator image with a true orfalse value for each pixel in the original image, that indicates ordetermines whether the pixel is interesting or represents active regionsthat may be considered for further processing. Descriptor setcomputation block 110 computes a single- or multi-dimensional float orinteger valued vector that provides a description of an image region.Examples include shape, texture, spectral, or other descriptors. Lesionclassification block 200 classifies each pixel marked by interest regionidentification block 108 using descriptors computed using descriptor setcomputation block 110 into different lesions. Joint segment recognitionblock 202 analyzes the information and provides an indicator of anyrecognized lesions.

2. Processing that can be Used to Locate the Lesions

a. Interest Region Detection

In some embodiments, interest region detection techniques described inthe section above entitled “Interest Region Detection” can be used tolocate lesions.

b. Descriptor Computation

In one embodiment, a set of descriptors that provide complementaryevidence about presence or absence of a lesion at a particular locationcan be used. Embodiments of the disclosed framework developed caneffectively describe lesions whose sizes vary significantly (for examplehemorrhages and exudates) due to local description of interest regionsat multiple scales.

Table 3 lists one embodiment of pixel level descriptors used for lesionlocalization and how the descriptors may contribute to lesionclassification.

TABLE 3 Descriptor Name Length How it contributes Median filterbank 90Bandpass median filter responses at multiple scales. Robustlycharacterizes interesting pixels Oriented median 120 Robustlydistinguish elongated filterbank structures from blob-like structuresHessian based 70 Describes local image descriptors characteristics ofblobs and tubes, such as local sharpness Blob statistics 80 Detects bloblike structures descriptors with statistics on blob shape, size, andcolor Gaussian derivative 20 Useful in extracting structures such asmicroaneurysms Color descriptor 30 Average color in RGB space in a localneighborhood Filterbank of Fourier 20 Extracts edge layout and localspectral descriptors textures, independent of local intensity LocalizedGabor 400 Extracts local spectral jets descriptors informationconcerning form and texture without sacrificing information about globalspatial relationships Filterbank of 80 Allows localization of smallmatched filters lesions such as microaneurysms. Can also be adapted forvessels Path opening 20 Effectively captures and closing localstructures, based morphological such as “curvy” vessels descriptorsfilterbank Filterbank of 200 Captures local texture information, localbinary can help achieve distinction patterns descriptors between lesionand background or other anatomical structures

Many of the descriptor sets are developed specifically for retinalimages, with a focus on low-level image processing. Measures of localimage properties alongside with some retinal fundus image specificmeasures at multiple scales can be used. Each of the descriptors listedbelow can be computed on scaled images I^(s) ⁰ , I^(s) ¹ . . . I^(s)^(n) . In one embodiment, the ratio between different scales is set to0.8 and 10 scales are used. Examples of multi-scale descriptors that canbe used for lesion localization and/or screening at each interest pixel(x_(int), y_(int)) are listed above in Table 3. The following providesinformation about one or more descriptors that may be used.

Morphological Filterbank Descriptors:

At each scale s_(k) a morphological filter can be applied to the imagewith the morphological filter computed over circles, squares, or regularpolygons or different sizes. For example, circles of different radii canbe used. In one embodiment, the median filtering is used as the saidmorphological filter. In this embodiment, at each scale s_(k) the mediannormalized RGB images A_(Norm,r) _(j) ^(s) ^(k) are computed (forexample, using Equation 1) with medians computed within circles ofdifferent values of radius r_(j), such that r_(j)>r_(j-1).

A _(diff,j-1) ^(s) ^(k) =A _(Norm,r) _(j) ^(s) ^(k) −A _(Norm,r) _(j-1)^(s) ^(k)

In one embodiment, median filterbank descriptor is A_(diff,j-1) ^(s)^(k) (x_(int), y_(int)) for all values of j. In one embodiment,r_(j)={7,15,31}, j=1,2,3, and r₀=3.

In one embodiment, the morphological filterbank descriptors are computedemploying the following: generating a first morphological filtered imageusing the retinal image, with a the said morphological filter computedover a first geometric shape; generating a second morphological filteredimage using the retinal image, with a morphological filter computed overa second geometric shape, the second geometric shape having one or moreof a different shape or different size from the first geometric shape;generating a difference image by computing a difference between thefirst morphological filtered image and the second median filtered image;and assigning the difference image pixel values as a descriptor valueeach corresponding to given pixel location of the said retinal image. Inone embodiment, the morphological filter employed is a median filter. Inone embodiment these descriptors are evaluated on a set of imagesobtained by progressively resizing the original image up and/or down bya set of scale-factors, so as to obtain a number or a vector of numbersfor each scale (“multi-scale analysis”), which are then concatenated tomake a composite vector of numbers (“multi-scale descriptors”).

Oriented Morphological Filterbank Descriptors:

At each scale s_(k) the oriented morphological filtered images arecomputed using structuring elements (geometric shapes) that resembleelongated structures, such as rectangles, ellipse, or the like. Thesefilters are applied at different orientations representing angular stepsof Δθ. Two different parameters of the structuring element (for example,length and width in case of a rectangular structuring element) are usedto compute two morphological filtered images at each orientation. Takingthe difference of these two images gives us the quantity of interest ateach pixel, which then forms part of the said oriented morphologicalfilterbank descriptors. In one embodiment, median filters are used asthe said morphological filter to obtain. In this embodiment, at eachscale s_(k) the oriented median normalized images

are computed (for example, using Equation 1) with medians computedwithin rectangular area

of length l and width w at angular steps of Δθ. In one embodiment,length l=30 and width w=2, and angular steps of Δθ=15 degrees are used.At each scale s_(k) the median normalized images

are computed (for example, using Equation 1) with medians computedwithin circle

of radius r. In one embodiment, a radius of r=3 is used.

I _(diff) ^(s) ^(k) =

−

Oriented median filterbank descriptor is I_(diff) ^(s) ^(k) (x_(int),y_(int)) at the different orientations.

These descriptors can distinguish elongated structures from blob-likestructures. The maximum or minimum value of the filterbank vector isidentified and the vector elements are rearranged by shifting eachelement by P positions until the said maximum or minimum value is in thefirst position, while the elements going out of the vector boundary arepulled back into the first position sometimes referred to as circularshifting.

In one embodiment, the oriented morphological filterbank descriptors arecomputed employing the following:

a. Computing morphological filtered image with the morphological filtercomputed over a circle or regular polygon (“structuring element” of themedian filter)

b. Computing another morphological filtered image with the morphologicalfilter computed over a geometric shape elongated structure, such as arectangle of specified aspect ratio (width, height) and orientation(angle) or an ellipse of specified foci and orientation (angle) of itsprincipal axis

c. Computing the difference image between the morphological filteredimages computed in (a) and in (b) and assign the difference image valueat a given pixel as its descriptor.

d. Computing a vector of numbers (“oriented median descriptors”) by (a)varying the orientation angle of the elongated structure and obtainingone number each for each orientation angle, and (b) stacking thuscomputed numbers into a vector of numbers.

In one embodiment, the maximum or minimum value of the orientedmorphological filterbank descriptor vector is identified and the vectorelements are rearranged by shifting each element by P positions untilthe said maximum or minimum value is in the first position, while theelements going out of the vector boundary are pulled back into the firstposition (“circular shifting”).

In one embodiment, these descriptors are evaluated on a set of imagesobtained by progressively resizing the original image up and/or down bya set of scale-factors, so as to obtain a number or a vector of numbersfor each scale (“multi-scale analysis”), which are then concatenated tomake a composite vector of numbers (“multi-scale descriptors”).

Gaussian Derivatives Descriptors:

Median normalized difference image is computed with radii r_(h) andr_(l), such that r_(h)>r_(l) at each scale s_(k).

I _(diff) ^(s) ^(k) =I _(Norm,r) _(h) ^(s) ^(k) −I _(Norm,r) _(l) ^(s)^(k)

This difference image I_(diff) ^(s) ^(k) is then filtered using Gaussianfilters G.

F ₀ =I _(diff) ^(s) ^(k) *G

The image after filtering with second derivative of the Gaussian is alsocomputed.

F ₂ =F ₀″

The Gaussian derivative descriptors are then F₀ (x_(int), Y_(int)) andF₂ (X_(int), y_(int)). These descriptors are useful in capturingcircular and ring-shaped lesions (for example, microaneurysms).

Hessian-Based Descriptors:

Median normalized difference image with bright vessels is computed withradii r_(h) and r₁, such that r_(h)>r₁ at each scale s_(k).

I _(diff) ^(s) ^(k) =I _(Norm,r) _(l) ^(s) ^(k) −I _(Norm,r) _(h) ^(s)^(k)

Then, Hessian H is computed at each pixel of the difference imageI_(diff) ^(s) ^(k) . Determinant of Hessian map L_(|H|) of thedifference image I_(diff) ^(s) ^(k) is the map of the determinant ofHessian H at each pixel. The sum modified Laplacian is computed todescribe the local image focus. Vesselness and structureness may becomputed, for example, as shown in FIG. 22. The Eigen values λ₁ and λ₂of H, such that |λ₁|≤λ₂ and their ratio |Δ₁|/λ₂, are evaluated. TheHessian based descriptor vector is collated from these values at theinterest pixel locations (x_(int), Y_(int)). These describe local imagecharacteristics of blobs and tubes, such as local sharpness.

Blob Statistics Descriptors:

Using the interest regions mask Z_(col) ^(s) ^(k) computed at scales_(k), the region properties listed in are measured at each blob. Theinterest pixels within a particular blob region are assigned with thesame blob statistics descriptor.

Table 4 is one embodiment of blob properties used as descriptors.

TABLE 4 Blob property Description Area Number of pixels in the blobregion Filled Area Number of pixels in the filled region PerimeterPerimeter of the blob which approximates the contour as a line throughthe centers of border pixels using a 4-connectivity Extent Ratio ofpixels in the blob region to pixels in the total bounding box for theblob Eccentricity Eccentricity of the ellipse that has the samesecond-moments as the region. Maximum intensity Value of greatestintensity in the blob region Minimum intensity Value of lowest intensityin the blob region Average intensity Value of mean intensity in the blobregion

Color Descriptors:

Average color is measured in a square block of length l centered at thepixel of interest. The color in RGB space is used as the colordescriptor for the pixel. In one embodiment, smoothing square of lengthl=5 is used.

Filterbank of Fourier Spectral Descriptors:

The natural logarithm of the Fourier transform magnitude and firstderivative of Fourier transform phase of a patch of image

centered at the pixel of interest at various frequencies are computed.These descriptors are invariant to rotation and scaling and can surviveprint and scanning. The natural logarithm of Fourier transform magnitudeof the image patch can be computed as follows:

F ₁(ω)=ln(|

(

)|)

F ₂(ω)=d(φ(

(

)))/dω

where F₁(ω) and F₂ (ω) are the fourier spectral descriptors,

is the fourier transform operation at frequency ω and φ denotes phase.

Localized Gabor Jets Descriptors:

Gabor jets are multi resolution Gabor features, constructed fromresponses of multiple Gabor filters at several frequencies andorientations. Gabor jet descriptors are computed as follows:

${G( {x,y,\lambda,\psi,\sigma,\gamma} )} = {{\exp ( {- \frac{x^{\prime 2} + {\gamma^{2}y^{\prime 2}}}{2\sigma^{2}}} )}{\cos ( {{2\pi \frac{x^{\prime}}{\lambda}} + \psi} )}}$where, x^(′) = x cos (θ) + y sin (θ)y^(′) = −x sin (θ) + y cos (θ)

λ is the wavelength of the sinusoidal factor, θ is the orientation ofthe normal to the striping of the Gabor function, ψ is the phase offset,σ is the standard deviation of the Gaussian envelope and γ is thespatial aspect ratio.

Filterbank of Matched Filters:

2D Gaussian filter is used as a kernel for multi-resolution matchfiltering. Gaussian filters of a range of sigmas are used as thefilterbank as follows:

${G( {x,y,\sigma} )} = {\exp ( {- \frac{x^{\prime 2} + y^{\prime 2}}{2\sigma^{2}}} )}$

Path Opening and Closing Based Morphological Descriptors Filterbank:

Path opening and closing based morphological descriptors use flexibleline segments as structuring elements during morphological operations.Since these structuring elements are adaptable to local imagestructures, these descriptors may be suitable to describe structuressuch as vessels.

Filterbank of local binary patterns descriptors:

Local binary patterns (LBP) capture texture information in images. Inone embodiment, a histogram with 20 bins to describe the LBP images isused.

c. Lesion Classification

In one embodiment, a support vector machine (SVM) is used for lesionclassification. In other embodiments, classifiers such as k-nearestneighbor, naive Bayes, Fisher linear discriminant, deep learning, orneural networks can be used. In another embodiment, multiple classifierscan be used together to create an ensemble of classifiers. In oneembodiment, four classifiers—one classifier for each of cottonwoolspots,exudates, hemorrhages, and microaneurysms—are trained and tested. In oneembodiment, ground truth data with lesion annotations on 100 images isused for all lesions, plus more than 200 images for microaneurysms. Theannotated dataset is split in half into training and testing datasets,and interest region detector is applied on the training dataset images.The detected pixels are sampled such that the ratio of the number ofpixels of a particular category of lesion in the training dataset tothose labeled otherwise remains a constant referred to as the balancefactor B. In one embodiment, B=5 for cottonwoolspots, exudates, andhemorrhages classifiers, and B=10 for microaneurysms.

In one embodiment, interest region detector is applied on the testingdataset images. The detected pixels are classified using the 4 differentlesion classifiers noted above. Each pixel then has 4 decisionstatistics associated with it. A decision statistic for a particularpixel is generated by computing the distance of the given element fromthe given lesion classification hyper plane defined by the supportvectors in the embodiment using SVM for lesion classification or in theembodiment using Fisher linear discriminant or the like. In case of theembodiment using a naive Bayes classifier or the embodiment using thek-nearest neighbor, the class probability for lesion class andnon-lesion class are computed and are used as the decision statistic.

d. Joint Recognition-Segmentation

In one embodiment, a biologically inspired framework is employed forjoint segmentation and recognition in order to localize lesions.Segmentation of interest region detector outputs the candidate lesion ornon-lesion blobs. The decision statistic output from pixel-levelclassifiers can provide evidence to enable recognition of these lesions.These decision statistics from different pixels and different lesiontypes are pooled within each blob to arrive at a blob-level recognition.The pooling process may include computing the maximum, minimum or theaverage of decision statistics for a given lesion type for all thepixels in a given blob. This process can be repeated iteratively,although in some embodiments, a single iteration can be sufficient. FIG.26A shows an example embodiment of microaneurysm localization. FIG. 26Bshows an example embodiment of hemorrhages localization. FIG. 26C showsan example of exudates localization. FIG. 27 illustrates one embodimentof a graph that quantifies the performance of lesion detection.

In another embodiment, the pixel level decision statistics over eachblob and building secondary descriptors can be combined. Secondarydescriptors can be one or more of the following:

-   -   Average value of the pixel decision statistics;    -   Bag of words (BOW) descriptors aggregated at blob level; or    -   Histogram of pixel decision statistics.

These aggregated descriptors can then be used to train blob-level lesionclassifiers and can be used to recognize and/or segment lesions. Thesedescriptors can also be used for screening.

E. Lesion-Based Biomarkers

1. Lesion Dynamics

Some embodiments pertain to computation of lesion dynamics, whichquantifies changes in the lesions over time.

FIG. 28 shows various embodiments of a lesion dynamics analysis systemand process. The patient 289000 is imaged by an operator 289016 using animage capture device 289002. In this embodiment, the image capturedevice is depicted as a retinal camera. The current image captured289010 is sent to the computer or computing device 289004. Images fromprevious visits 289100 can be obtained from a datacenter 289104. Lesiondynamics analysis 289110 is performed on the same computer or computingdevice 289004, on the cloud 289014, a different computer or computingdevice 289004, a mobile device 289008, or the like. The results arereceived by computer 289004 and then sent to a healthcare professional289106 who can interpret the results and report the diagnosis to thepatient. In one embodiment, the patient 289000 can take the image 289012himself using an image capture device 289006, for example, a retinalcamera attachment for a mobile device 289008. The images from previousvisits 289102 are downloaded to the mobile device from the datacenter289104. Lesion dynamics analysis is performed on the mobile device, onthe cloud 289014, or a computer or computing device 289004, on adifferent mobile device, or the like. The results of the analysis areprovided to the mobile device 289008, which performs an initialinterpretation of the results and presents a diagnosis report to thepatient. The mobile device 289008 can also notify the healthprofessional if the images contain any sign of disease or items ofconcern.

FIG. 29A depicts an example of one embodiment of a user interface of thetool for lesion dynamics analysis depicting persistent, appeared, anddisappeared lesions. The user can load the images from a database byinputting a patient identifier and range of dates for analysis. Asdepicted in the embodiment shown in FIG. 29B, when the user clicks on“View turnover,” the plots of lesion turnover for the chosen lesions aredisplayed. As depicted in the embodiment shown in FIG. 29C, when thetoggle element to change from using the analysis to viewing the overlaidimages is utilized, longitudinal images for the selected field betweenthe selected two visits are shown. The user can change the transparencyof each of the image using the vertical slider.

In one embodiment, longitudinal retinal fundus images are registered tothe baseline image as described in the section above entitled “ImageRegistration”. On each of the images, including the baseline image,lesions are localized as described in the section above entitled “LesionLocalization”. In some embodiments, characterizing dynamics of lesionssuch as exudates (EX) and microaneurysms (MA) may be of interest. In oneembodiment, the appearance and disappearance of MA, also referred to asMA turnover is considered. The first image in the longitudinal series isreferred to as the baseline image I_(b) and any other registeredlongitudinal image is denoted as I_(l).

FIG. 30 illustrates an embodiment used in evaluating lesion dynamics.The blocks shown here can be implemented on the cloud 289014, a computeror computing device 289004, a mobile device 289008, or the like as, forexample, shown in FIG. 28. The input source image 308 and destinationimage 314 refer to a patient's retinal data, single or multidimensional,that has been captured at two different times using a retinal imagingdevice, such as cameras for color image capture, fluorescein angiography(FA), adaptive optics, optical coherence tomography (OCT), hyperspectralimaging, scanning laser ophthalmoscope (SLO), wide-field imaging orultra-wide-field imaging. Image 100 is input into the lesionlocalization module 112. FIG. 13A illustrates an embodiment of the imageregistration block 310. Lesion dynamics module 500 computes changes inlesions across retinal images imaged at different times. Lesion changescan include appearance, disappearance, change in size, location, or thelike.

a. Lesion Matching for MA Turnover Computation

In one embodiment, binary images B_(b) and B_(l) with lesions ofinterest marked out are created for the baseline and longitudinalimages. Lesion locations are labeled in B_(b) and compared to thecorresponding regions in B_(l) with a tolerance that can, for example,be specified by maximum pixel displacement due to registration errors.The labeled lesion is marked as persistent if the corresponding regioncontains a MA, else it is marked as a disappearing MA. Labelingindividual lesions in B_(l) and comparing them to corresponding regionsin B_(b) gives a list of newly appeared lesions. FIGS. 31A, 31B, 31C and31D depict embodiments and examples of longitudinal images forcomparison to identify persistent, appeared and disappeared lesions. Theimages are zoomed to view the lesions. FIG. 31A shows the baselineimage. FIG. 31B shows the registered longitudinal image. FIG. 31C showslabeled MAs in the baseline image with persistent MAs indicated byellipses and non-persistent MAs by triangles. FIG. 31D shows labeled MAsin the longitudinal image with persistent MAs indicated by ellipses.Counting the newly appeared lesions and disappeared lesions over theperiod of time between the imaging sessions allows computation of lesionturnover rates, or MA turnover if the lesion under consideration is MA.

In another embodiment, the baseline image I_(b) and registeredlongitudinal image I_(l) are used rather than the registered binarylesion maps. Potential lesion locations are identified using theinterest region detector as, for example, described in the section aboveentitled “Interest Point Detection”. In one embodiment, these pixels arethen classified using lesion classifier, for example, as described inthe lesion localization section using, for example, descriptors listedin Table 3. The regions with high certainty of including lesions inI_(b), as expressed by the decision statistics computed over the pixels,are labeled. In one embodiment, these regions are then matched withcorresponding regions in I_(l) with a tolerance, for example, asspecified by maximum pixel displacement which may be due to registrationerrors using decision statistics. In one embodiment, regions withmatches to the labeled lesions with high confidence are then consideredto be persistent lesions and labeled regions with no matches areconsidered to be disappeared lesions. Newly appearing lesions can befound by labeling image I_(l) and comparing those regions tocorresponding regions in I_(b) to identify newly appearing lesions.

b. Increased Reliability and Accuracy in Turnover Computation

Some factors can confound lesion turnover computation such as MAturnover computation, variation in input images, errors in imagealignment, or errors in MA detection and localization. Some errors cancascade and cause the MA turnover computed to be drastically differentfrom the actual value, which could be a failure for the tool. In someembodiments, a system that gracefully degrades when faced with the aboveconfounding factors is desirable. At each stage, rather than making abinary decision, the probability that a blob is classified as an MA orthe probability that two blobs are marked as matched MAs and hencepersistent is estimated. As noted above, a blob includes a group ofpixels with common local image properties and chosen by the interestregion detector. FIG. 32A shows a patch of retina with microaneurysms.FIG. 32B shows the ground truth labelling for microaneurysms in theimage patch shown in FIG. 32A. FIG. 32C shows the detected MAs marked bydisks with the corresponding confidence levels indicated by thebrightness of the disk. An estimated range for MA turnover is computedrather than a single number. A larger range may represent someuncertainty in the turnover computation, nevertheless it can provide theclinician with useful diagnostic information. In one embodiment, one ormore of the following is performed when confounding factors are present.

-   -   i. Handling quality variations in input image: The quality of        the input images can vary as they are images at different time,        possibly using different imaging systems and by different        operators. The quality of the image can be inferred locally. The        quality of the sections of the image can be used as a weight to        infer confidence in MA detection along with the classifier        decision statistic.    -   ii. Local registration refinement for global image alignment        error correction: Registration errors can occur due to lack of        matching keypoints between images. Local refinement of        registration using a small image patch centered on the putative        microaneurysm can be used to correct these errors. FIG. 33A        shows baseline and Month 6 images registered and overlaid.        Misalignment causes persistent MA to be wrongly identified as        disappeared and appeared. FIG. 33B shows the baseline image, as        grayscale image of the enhanced green channel only. The dotted        box shows region centered around the detected MA, with inset        showing zoomed version. FIG. 33C shows Month 6 image, as        grayscale image of the enhanced green channel only. The dotted        region around MA in FIG. 33B is correlated with the image shown        in FIG. 33C to refine the registration. The dotted box in FIG.        33C corresponds with the box in FIG. 33B, and the solid box in        FIG. 33C indicates the new location after refinement. MA is now        correctly identified as persistent. When the local patches are        aligned, the putative microaneurysms are then matched to        evaluate persistent MAs.    -   iii. Robust persistent microaneurysm classification:        Probabilities can be used to represent the classification of a        given blob into microaneurysm or otherwise. Persistent MAs are        marked in the ground truth representation and will describe        pairs of blobs with the histogram decision statistics of the        pixels in the blobs along with similarity of the blobs. The        labeled persistent MAs can be used to train a SVM classifier.        Given a pair of putative blobs in the neighborhood after local        registration refinement, the probability that these blobs are a        persistent MA pair is computed.

As shown in embodiments of FIGS. 34A and 34B, the range for turnovernumbers is then assessed from the blob level probabilities andpersistent MA pair probabilities using thresholds identified from theground truth.

F. Encounter-Level Processing Framework

Medical and retinal images captured during a given visit of a givenpatient are typically captured using the same imaging set-up. The set ofthese images is termed an encounter (of that patient on that date). Theanalysis of the images in a given encounter can be performed jointlyusing data from all the images. For example, the presence or absence oflesions in one eye of a given patient can be determined after examiningall the images captured of that eye.

In one embodiment, a method for detection of regions with abnormality inmedical (particularly retinal) images using one or at least two or moreimages obtained from the same patient in the same visit (“encounter”)can include one or more of the following:

a. Identifying a subset of images for further analysis based on imagequality, image content, such as the image being a lens shot or anon-retinal image, or of poor quality or fidelity;

b. For each image identified in (a) designating some pixels in the imageas active pixels, meaning they contain the interesting regions of theimage, using of one or more techniques from (i) conditional numbertheory, (ii) multi-scale interest region detection, (iii) vasculatureanalysis, and (iv) structured-ness analysis;

c. For each image identified in (a), computing a vector of numbers(“primary descriptors”) at each of the pixels identified in (b) usingone or at least two or more types from (i) median filterbankdescriptors, (ii) oriented median filterbank descriptors, (iii) Hessianbased descriptors, (iv) Gaussian derivatives descriptors, (vi) blobstatistics descriptors, (vii) color descriptors, (viii) matched filterdescriptors, (ix) path opening and closing based morphologicaldescriptors, (x) local binary pattern descriptors, (xi) local shapedescriptors, (xii) local texture descriptors, (xiii) local Fourierspectral descriptors, (xiv) localized Gabor jets descriptors, (xv) edgeflow descriptors, (xvi) edge descriptors such as difference ofGaussians, (xvii) focus measure descriptors such as sum modifiedLaplacian, (xix) saturation measure descriptors, (xx) contrastdescriptors, or (xxi) noise metric descriptors;

d. For each image, for each pixels identified in (b), computingpixel-level classifier decision statistic (a number quantifying thedistance from the classification boundary) using supervised learningutilizing the primary descriptors computed in (c) using one or more of(i) support vector machine, (ii) support vector regression, (iii)k-nearest neighbor, (iv) naive Bayes, (v) Fisher linear discriminant,(vi) neural network, (vii) deep learning, (viii) convolution networks,or (ix) an ensemble of one or more classifiers including from(i)-(viii), with or without bootstrap aggregation;

e. For each image identified in (a), computing a vector of numbers(“image-level descriptors”) by using one or least two or more typesfrom:

-   -   i. histogram of pixel-level classifier decision statistics        computed in (d);    -   ii. descriptors based on dictionary of codewords of pixel-level        descriptors (primary descriptors) computed in (c) aggregated at        image level; or    -   iii. histogram of blob-level decision statistic numbers (one        number per blob) computed as mean, median, maximum, or minimum        of pixel-level classifier decision statistics computed in (d)        for all pixels belonging to the blob;

f. Combining the image-level descriptors computed in (e) with or withoutfurther processing for the subset of images identified in (a) to obtainencounter-level descriptors;

g. Classifying encounters using encounter-level descriptors computed in(f) as normal or abnormal (one classifier each for each abnormality,lesion, or disease) using one or more of supervised learning techniquesincluding but not limited to: (i) support vector machine, (ii) supportvector regression, (iii) k-nearest neighbor, (iv) naive Bayes, (v)Fisher linear discriminant, (vi) neural network, (vii) deep learning,(viii) convolution networks, or (ix) an ensemble of one or moreclassifiers including from (i)-(viii), with or without bootstrapaggregation.

In another embodiment, the combining image-level descriptors intoencounter-level descriptors for the images of the patient visit(encounter) identified in (a) is achieved using operations that includebut are not limited to averaging, maximum, minimum or the like acrosseach index of the descriptor vector, so that the said encounter-leveldescriptors are of the same length as the image-level descriptors.

In another embodiment, the combining image-level descriptors for theimages of the patient visit (encounter) identified in (a) to obtainencounter-level descriptors is achieved using a method including: (i)combining image-level descriptors to form either the image field-of-view(identified from meta data or by using position of optic nerve head andmacula)-specific or eye (identified from meta data or by using positionof optic nerve head and macula)-specific descriptors, or (ii)concatenating the field-specific or eye-specific descriptors into theencounter level descriptors.

1. Ignoring Lens Shot Images

Images in an encounter can be identified to be lens shot images, using,for example, the method described in the section above entitled “LensShot Image Classification.” These lens shot images can be ignored andexcluded from further processing and analysis since they may not providesignificant retinal information. The images that are not retinal fundusimages are ignored in this part of the processing.

2. Ignoring Poor Quality Images

Images in an encounter can be identified as having poor quality using,for example, the method described in the section above entitled “ImageQuality Assessment.” These poor quality images can be excluded fromfurther processing and analysis since the results obtained from suchimages with poor quality are not reliable. If a given encounter does nothave the required number of adequate/good quality images then thepatient is flagged to be re-imaged.

3. Ways of Creating Encounter-Level Decisions

a. Merging Image-Level Primary Descriptors

Encounter-level descriptors can be obtained by combining image-levelprimary descriptors, many of which are described in the sections aboveentitled “Processing That Can Be Used To Locate The Lesions.” and“Features that can be used for this type of automatic detection”. In oneembodiment, the image level descriptors include one or more types from:

-   -   i. histogram of pixel-level classifier decision statistics        computed;    -   ii. descriptors based on dictionary of codewords of pixel-level        descriptors (primary descriptors) aggregated at image level; or    -   iii. histogram of blob-level decision statistic numbers (one        number per blob) computed as mean, median, maximum, or minimum        of pixel-level classifier decision statistics computed for        pixels belonging to the blob.

In one embodiment, the encounter-level descriptors can be evaluated asthe maximum value across all the image level descriptors for the imagesthat belong to an encounter or created by concatenating eye leveldescriptors. In one embodiment, the computation of encounter-leveldescriptors for the images of the patient visit (encounter) is achievedusing a method comprising (i) combining image-level descriptors to formeither the image field-of-view, specific descriptors (identified frommetadata or by using position of ONH as described in the section aboveentitled “Optic Nerve Head Detection” or by using the position of theONH and macula) or eye-specific descriptors (identified from metadata orposition of ONH and macula or the vector from the focus to the vertex ofthe parabola that approximates the major vascular arch) using operationssuch as maximum, average, minimum or the like, and (ii) concatenatingthe field-specific or eye-specific descriptors into the encounter leveldescriptors. These encounter-level descriptors can then be classified,for example, using classifiers described in the section below entitled“Diabetic Retinopathy Screening” to obtain the encounter-leveldecisions. Combination of image level descriptors to form encounterlevel descriptors is discussed in further detail in section “Multi-LevelDescriptors For Screening”.

b. Merging image-level decision statistics

Encounter-level decisions can also be made by combining image-leveldecision statistics histograms using average, maximum, and minimumoperations, or the like.

VI. Automated Screening

Methods, systems and techniques described can also be used to automatescreening for various medical conditions or diseases, which can helpreduce the backlog of medical images that need to be screened. One ormore of the techniques described earlier or in the following sectionsmay be used to implement automated screening; however, using thesetechniques is not required by for every embodiment of automatedscreening.

A. Screening for Retinal Diseases

FIG. 35 shows one embodiment of scenarios in which disease screening canbe applied. In one scenario, the patient 359000 is imaged by an operator359016 using an image capture device 359002. In the illustratedembodiment, the image capture device is a retinal camera. The imagescaptured are sent to a computer or computing device 359004 for furtherprocessing or transmission. In one embodiment all captured images 359010from the computer or computing device are sent for screening analysiseither on the cloud 359014, on a computer or computing device 359004, ona mobile device 359008, or the like. In another embodiment only goodquality images 359010 from the computer or computing device are sent forscreening analysis either on the cloud 359014, on the computer orcomputing device 359004, on the mobile device 359008, or the like. Thescreening results are sent to a healthcare professional 359106 whointerprets the results and reports the diagnosis to the patient. In thesecond scenario, the patient 359000 takes the image himself using animage capture device 359006, which in this case is shown as a retinalcamera attachment for a mobile device 359008. All images or just goodquality images 359012 from the mobile phone are sent for screeninganalysis. The results of the analysis are returned to the mobile device,which performs an initial interpretation of the results and presents adiagnosis report to the patient. The mobile device also notifies thehealth professional if the images contain any signs of disease or otheritems of concern.

FIG. 36 depicts an example of embodiments of the user interface of thetool for screening. FIG. 36A and FIG. 36B describe the user interfacefor single encounter processing whereas FIG. 36C and FIG. 36D describethe user interface for batch processing of multiple encounters. In FIG.36A, a single encounter is loaded for processing and when the userclicks on “Show Lesions,” the detected lesions are overlaid on theimage, as shown in FIG. 36B. An embodiment of a user interface of thetool for screening for multiple encounters is shown in FIG. 36C, and thedetected lesions overlaid on the image are displayed when the userclicks on “View Details,” as shown in FIG. 36B.

The embodiments described above are adaptable to different embodimentsfor screening of different retinal diseases. Additional embodiments aredescribed in the sections below related to image screening for screeningfor diabetic retinopathy and image screening for screening forcytomegalovirus retinitis.

a. Multi-Level Descriptors for Screening

FIG. 37 discloses one embodiment of an architecture for descriptorcomputation at various levels of abstraction. The illustrated blocks maybe implemented on the cloud 19014, a computer or computing device 19004,or a mobile device 19008, or the like, as shown in FIG. 1. Pixel leveldescriptors 3400 are computed, using for example the process describedin the section above entitled “Lesion Classification”. Lesionclassifiers for microaneurysms, hemorrhages, exudates, orcottonwoolspots are used to compute a decision statistic for each ofthese lesions using the pixel level descriptors. Pixels are grouped intoblobs based on local image properties, and the lesion decisionstatistics for a particular lesion category of all the pixels in a groupare averaged to obtain blob-level decision statistic 3402. Histograms ofpixel-level and blob averaged decision statistics for microaneurysms,hemorrhages, exudates, or cottonwoolspots are concatenated to buildimage level descriptors 3404. Alternatively, image level descriptorsalso include bag of words (BOW) descriptors, using for example theprocess described in the section above entitled “Description WithDictionary of Primary Descriptors”. Eye-level descriptors 3406 areevaluated as the maximum value across all the image level descriptorsfor the images that belong to an eye. Images that belong to a particulareye can be either identified based on metadata, inferred from fileposition in an encounter or deduced from the image based on relativepositions of ONH and macula. Encounter-level descriptors 3408 areevaluated as the maximum value across all the image level descriptorsfor the images that belong to an encounter. Alternatively,encounter-level descriptors can be obtained by concatenating eye-leveldescriptors. Lesion dynamics computed for a particular patient frommultiple encounters can be used to evaluate patient level descriptors3410.

b. Hybrid Classifiers

Ground truth labels for retinopathy and maculopathy can indicate variouslevels of severity, for example R0, R1, M0 and so on. This informationcan be used to build different classifiers for separating the various DRlevels. In one embodiment, improved performance can be obtained forclassification of R0M0 (no retinopathy, no maculopathy) cases from otherdisease cases on Messidor dataset by simply averaging the decisionstatistics of the no-retinopathy-and-no-maculopathy (“R0M0”) versus therest classifier, and no-or-mild-retinopathy-and-no-maculopathy(“R0R1M0”) versus the rest classifier. (A publically available datasetis kindly provided by the Messidor program partners athttp://messidor.crihan.fr/.) One or more of the following operations maybe applied with the weights w_(t) on each training element initializedto the same value on each of the classifier h_(t) obtained. In someembodiments, the operations are performed sequentially.

1. With the training dataset weighted the best remaining classifierh_(t) is applied to evaluate AUROC A_(t). The output weight α_(t) forthis classifier is computed as below:

$\alpha_{t} = {\frac{1}{2}\ln \frac{A_{t}}{1 - A_{t}}}$

2. The weight distribution w_(t+1) on the input training set for thenext classifier is computed as below:

w _(t+1)(i)=w _(t)(i)exp α_(t)(2(y _(i) ≠h _(t)(x _(i)))−1)

where, x_(t),y_(i) are the classifier inputs and the correspondinglabels.

The output weights α_(t) are used to weight the output of each of theclassifiers to obtain a final classification decision statistic.

c. Ensemble Classifiers

In one embodiment, ensemble classifiers are employed, which are a set ofclassifiers whose individual predictions are combined in a way thatprovides more accurate classification than the individual classifiersthat make them up. In one embodiment, a technique called stacking isused, where an ensemble of classifiers, at base level, are generated byapplying different learning algorithms to a single dataset, and thenstacked by learning a combining method. Their good performance is provedby the two top performers at the Netflix competition using, for example,techniques disclosed in Joseph Sill et al., Feature-Weighted LinearStacking, arXiv e-print, Nov. 3, 2009. The individual weak classifiers,at the base level, may be learned by using algorithms such as decisiontree learning, naive Bayes, SVM, or multi response linear regression.Then, at the meta level, effective multiple-response model trees areused for stacking these classifier responses.

d. Deep Learning

In another embodiment, the system employs biologically plausible, deepartificial neural network architectures, which have matched humanperformance on challenging problems such as recognition of handwrittendigits, including, for example, techniques disclosed in Dan Cirean, UeliMeier, and Juergen Schmidhuber, Multi-Column Deep Neural Networks forImage Classification, arXiv e-print, Feb. 13, 2012. In otherembodiments, traffic signs, or speech recognition are employed, using,for example, techniques disclosed in M. D. Zeiler et al., “On RectifiedLinear Units for Speech Processing,” 2013. Unlike shallow architectures,for example, SVM, deep learning is not affected by the curse ofdimensionality and can effectively handle large descriptors. In oneembodiment, the system uses convolution networks, sometimes referred toas cony-nets, based classifiers, which are deep architectures that havebeen shown to generalize well for visual inputs.

B. Types Of Diseases

1. Diabetic Retinopathy Screening

a. General Description

In one embodiment, the system allows screening of patients to identifysigns of diabetic retinopathy (DR). A similar system can be applied forscreening of other retinal diseases such as macular degeneration,hypertensive retinopathy, retinopathy or prematurity, glaucoma, as wellas many others.

When detecting DR, two DR detection scenarios are often of interest: (i)detecting any signs of DR, even for example a single microaneurysm (MA)since the lesions are often the first signs of retinopathy or (ii)detecting DR onset as defined by the Diabetes Control and ComplicationsTrial Control and Group, that is, the presence of at least three MAs orthe presence of any other DR lesions. The publicly available Messidordataset, which contains 1200 retinal images that have been manuallygraded for DR and clinically significant macular edema (CSME), can beused for testing the system. In one embodiment, the screening system,when testing for this Messidor dataset, uses >5MAs or >0 Hemorrhages(HMs) as criteria for detecting DR onset. For both of the detectionscenarios, the goal is to quantify working on cross-dataset testing,training on a completely different data, or on a 50-50 test-train splitof the dataset.

FIG. 38 depicts one embodiment of a pipeline used for DR screening. Theillustrated blocks may be implemented either on the cloud 19014, acomputer or computing device 19004, a mobile device 19008, or the like,as shown in FIG. 1. The image 100 refers in general to the retinal data,single or multidimensional, that has been captured using a retinalimaging device, such as cameras for color image capture, fluoresceinangiography (FA), adaptive optics, optical coherence tomography (OCT),hyperspectral imaging, scanning laser ophthalmoscope (SLO), wide-fieldimaging or ultra-wide-field imaging. Image 100 is input to fundus maskgeneration block 102 and image gradability computation block 104 andimage enhancement module 106 if the image is of sufficient quality.Interest region identification block 108 and descriptor set computationblock 110 feed into lesion localization block 112 which determines themost likely label and/or class of the lesion and extent of the lesion.This output can be used for multiple purposes such as abnormalityscreening, diagnosis, or the like. DR screening block 114 determineswhether a particular fundus image includes abnormalities indicative ofdiabetic retinopathy such that the patient should be referred to anexpert.

In one embodiment, two approaches can be used in the system: one for the50-50 train/test split and the other for the cross-dataset testing withtraining on one dataset and testing on another. One embodiment uses theMessidor dataset and the DEI dataset (kindly provided by Doheny EyeInstitute) which comprises 100 field 2 images with four lesionsdiligently annotated pixel-wise (MA, HM, EX and CW), and 125 field 2images with MAs marked. When using the system on these datasets, theannotations performed precisely, often verifying the annotations usingthe corresponding fluorescein angiography (FA) images. This preciseannotation sets high standards for the automated lesion localizationalgorithms, especially at lesion-level.

b. Features that can be Used for Automatic Detection

i. Description with Dictionary of Primary Descriptors

In this embodiment, a dictionary of low-level features is computed byunsupervised learning of interesting datasets, referred to as codewords.The dictionary may be computed by technology disclosed in J. Sivic andA. Zisserman, “Video Google: A Text Retrieval Approach to ObjectMatching in Videos,” in 9th IEEE International Conference on ComputerVision, 2003, 1470-1477. Then an image is represented using a bag ofwords description, for example a histogram of codewords found in theimage. This may be performed by finding the codeword that is closest tothe descriptor under consideration. The descriptors for an image areprocessed in this manner and contribute to the histogram.

A 50-50 split implies that training is done with half the dataset andtesting is done on the other half. The computation of the dictionary canbe an offline process that happens once before the system or method isdeployed. In one embodiment, the unsupervised learning dataset isaugmented with descriptors from lesions. In an example implementation,the descriptors from lesions locations annotated on the DEI dataset areused. For this example implementation, the total number of descriptorscomputed is N_(DEI) and N_(Mess), for DEI and Messidor datasets,respectively. Then N_(Mess)≈mN_(DEI) where m≥1.0 can be any real number,with each Messidor training image contributing equally to the N_(Mess)descriptor count. In one embodiment, m is set to 1 and in anotherembodiment, it is set to 5. The random sampling of interesting locationsallows signatures from non-lesion areas to be captured. The computedN_(Mess)+N_(DEI) descriptors are pooled together and clustered into Kpartitions using K-means clustering, the centroids of which giveK-codewords representing the dictionary. The K-means clustering may beperformed using techniques disclosed in James MacQueen, “Some Methodsfor Classification and Analysis of Multivariate Observations,” inProceedings of the Fifth Berkeley Symposium on Mathematical Statisticsand Probability, Vol. 1, 1967, 14.

After the dictionary computation, the bag of words based (BOW) secondarydescriptors are computed. In one embodiment, for each image, the lesiondescriptors 110 are computed. Using vector quantization, each descriptoris assigned a corresponding codeword from the previously computeddictionary. The vector quantization may be performed using techniquesdisclosed in Allen Gersho and Robert M. Gray, Vector Quantization andSignal Compression (Kluwer Academic Publishers, 1992). This assignmentcan be based on which centroid or codeword is closest in terms ofEuclidean distance to the descriptor. A normalized K-bin histogram isthen computed representing the frequency of codeword occurrences in theimage. The histogram computation does not need to retain any informationregarding the location of the original descriptor and therefore theprocess is referred to as “bagging” of codewords. These descriptors arereferred to as bag of words (BOW) descriptors.

Table 5 is comparison of embodiments of the screening methods. Theresults for one embodiment are provided for reference alone, noting thatthe other results are not cross dataset. “NA” in the table indicates thenon-availability of data. The column labelled “Quellec” provides resultswhen applying the method described in Gwénolé Quellec et al., “AMultiple-Instance Learning Framework for Diabetic RetinopathyScreening,” Medical Image Analysis 16, No. 6 (August 2012): 1228-1240,the column labelled “Sanchez” shows results when applying the methoddescribed in C. I. Sanchez et al., “Evaluation of a Computer-AidedDiagnosis System for Diabetic Retinopathy Screening on Public Data,”Investigative Ophthalmology & Visual Science 52, No. 7 (Apr. 28, 2011):4866-4871, and the column labelled “Barriga” shows results when applyingthe method of E. S. Barriga et al., “Automatic System for DiabeticRetinopathy Screening Based on AM-FM, Partial Least Squares, and SupportVector Machines,” in 2010 IEEE International Symposium on BiomedicalImaging: From Nano to Macro, 2010, 1349-1352.

TABLE 5 System System embodiment Embodiment one two Quellec . . .Sanchez . . . Barriga . . . AUROC 0.915 0.857 0.881 0.876 0.860sensitivity specificity 50% 95% 88% 92% 92% NA 75% 88% 82% 86% 83% NAspecificity sensitivity 90% 70% 39% 66% 55% NA 85% 82% 62% 75% 65% NA

In one embodiment, after the BOW descriptors have been computed for theimages, they are subjected to term frequency−inverse document frequency(tf-idf) weighting, using, for example, techniques disclosed inChristopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze,Introduction to Information Retrieval, Vol. 1 (Cambridge UniversityPress Cambridge, 2008). This is done to scale down the impact ofcodewords that occur very frequently in a given dataset and that areempirically less informative than codewords that occur in a smallfraction of the training dataset, which might be the case with “lesion”codewords. In some embodiments, the inverse document frequency (idf)computation is done using the BOW descriptors of the training datasetimages. In addition, during computation of document frequency, adocument may be considered if the raw codeword frequency in it is abovea certain threshold T_(df). The tf-idf weighting factors computed ontraining dataset are stored and reused on the BOW descriptors computedon the images in the test split of Messidor dataset during testing.

In one embodiment, the system adds a histogram of the decisionstatistics (for example, the distance from classifier boundaries) forpixel level MA and HM classifiers. This combined representation may beused to train a support vector machine (SVM) classifier using the 50-50test/train split. In one embodiment, the number of descriptors computedis N_(Mess)≈N_(DEI)≈150,000, and these 300K descriptors are clustered toget K=300 codewords. In addition, the document frequency computation mayuse T_(df)=0, but for other embodiments may use T_(df)=3. Theseparameter choices of these embodiments result in an impressive ROC curvewith AUROC of 0.940 for DR onset and 0.914 for DR detection as shown inTable 5 and FIG. 39. These are the best results among those reported inliterature for the Messidor dataset.

In addition, in one embodiment, a histogram of blob-level decisionstatistics that is computed using one or more of the followingoperations is added: (i) computation of the blobs in the image atvarious scales using the detected pixels, (ii) computation of theaverage of the decision statistics to obtain one number per blob, (iii)training of one or more another classifiers for lesions using theblob-level decision statistics as the feature vector and use the newdecision statistic, or (iv) computation of one or more histograms ofthese decision statistics to form a blob-level histogram(s) descriptor.In one embodiment, these histogram descriptors are normalized to sum to1 so as to mathematically look like a probability distribution.

As discussed above, different descriptor types may be combined invarious embodiments, this does not preclude the use of any individualdescriptor type, or an arbitrary combination of a subset of descriptortypes.

c. Screening Using Lesion Classifiers Trained on Another Dataset(Cross-Dataset Testing)

In another embodiment, the method or system could be applied to across-dataset scenario. This implies that the testing is done on acompletely new, unseen dataset. In an example implementation,cross-dataset testing is applied on all 1200 Messidor images without anytraining on this dataset. Instead, the system uses the decisionstatistics computed for the various lesions. These statistics are thedistances from classifier boundaries, with the classifier being trainedon the expert-annotated images. In this example implementation, 225images from the DEI dataset are employed. The ROC curves for thisexample implementation, shown in FIG. 40, demonstrate an impressivecross-dataset testing performance, especially for detecting DR onset(AUROC of 0.91). For detecting any signs of DR, the AUROC of 0.86convincingly beats the best reported in literature, including crossdataset AUROC of 0.76 disclosed in Quellec et al., “A Multiple-InstanceLearning Framework for Diabetic Retinopathy Screening.” Table 5 presentsa comparison of screening performance of some embodiments with variouscompeting approaches on the Messidor dataset, clearly showing superiordiagnostic efficacy of the disclosed embodiments. Table 6 compares theresults from the two approaches. Table 6 provides screening results(AUROC) for the two embodiments of screening system on Messidor dataset.

TABLE 6 Refer any Method retinopathy Refer >5 MAs System embodiment one0.915 0.943 System embodiment two 0.857 0.910

2. Cytomegalovirus Screening

a. General Description

Cytomegalovirus retinitis (CMVR) is a treatable infection of the retinaaffecting HIV and AIDS patients and is a leading cause of blindness inmany developing countries. In one embodiment, methods and systems forscreening of Cytomegalovirus retinitis using retinal fundus photographsis described. Visual inspection of the images from CMVR patients revealsthat, images with CMVR typically have large sub-foveal irregular patchesof retinal necrosis appearing as a white, fluffy lesion with overlyingretinal hemorrhages as seen in FIGS. 41C and 41D. These lesions haveseverely degraded image quality, for example, focus, contrast, normalcolor, when compared with images of normal retina, as shown in FIGS. 41Aand 41B. A system which can effectively capture and flag the degradationin image quality can be used to screen for CMVR. Accordingly, in oneembodiment, the image quality descriptors are adapted to the problem ofCMVR screening, providing a new use of the image quality descriptorsdescribed herein.

b. Features that can be Used for this Type of Automatic Detection

In one embodiment, the image analysis engine automatically processes theimages and extracts novel quality descriptors, using, for example, theprocess described in the section above entitled “Lens Shot ImageClassification”. These descriptors are then subjected to principalcomponent analysis (PCA) for dimensionality reduction. They can then beused to train a support vector machine (SVM) classifier in a 5-foldcross-validation framework, using images that have been pre-graded forCytomegalovirus retinitis by experts, for example, into two categories:normal retina, and retina with CMVR. In one embodiment, images graded byexperts at UCSF and Chiang Mai University Medical Centre, Thailand areemployed. The system produces a result of refer for a patient image fromcategory retina with CMVR, and no refer for a patient image fromcategory normal retina.

FIG. 42 depicts a process for one embodiment of CMVR screening. Theillustrated blocks may be implemented either on the cloud 19014, or acomputer or computing device 19004, a mobile device 19008 or the like,as shown in FIG. 1. The image 100 refers in general to the retinal data,single or multidimensional, that has been captured using a retinalimaging device, such as cameras for color image capture, fluoresceinangiography (FA), adaptive optics, optical coherence tomography (OCT),hyperspectral imaging, scanning laser ophthalmoscope (SLO), wide-fieldimaging or ultra-wide-field imaging. Image 100 is input to the imageenhancement module 106 and then input to interest region identificationblock 108 and descriptor set computation block 110. The descriptors areinput to CMVR screening block 900 to determine whether a particularfundus image includes abnormalities indicative of Cytomegalovirusinfection and that the patient needs to be referred to an expert.

One embodiment was tested using 211 images graded for CMVR, by randomlysplitting them into 40 different training-testing datasets. In eachsplit, 75% of the images were used for training and the other 25% werereserved for testing. As expected, the lesion degraded, poor qualityimages were flagged to be positive for CMVR by the system with anaverage accuracy of 85%, where average area under ROC curve (AUROC) was0.93. For many of the images, the presence of large out-of-focus,blurry, or over-/under-exposed regions, such as shown in FIGS. 41E, 41Ffor example, resulted in the degradation of image quality causing theexperts to be unsure about the presence or absence of CMVR duringscreening. These images, marked with category cannot determine, wereexcluded from the above experiments. By choosing an SVM classifier thatproduces a ROC curve with an AUROC close to the average of 0.93 obtainedduring the 40 experiments, an additional 29 images from the cannotdetermine category were tested. None of these images were includedduring training phase. The system recommended that 27 of 29 images(patients) be referred, which is acceptable given that experts too didnot have consensus on CMVR status of those two images.

In one embodiment, the quality of the image is first analyzed using a“gradability assessment” module. This module will flag blurry, saturatedor under exposed images to be of poor quality and unsuitable forreliable screening. The actual CMVR screening would then be performed onimages that have passed this quality module. Both system could use thesame descriptors, but one can use a support vector regressor enginetrained to assess quality, and the other a support vector classifiertrained to screen for CMVR. In another embodiment, additionaldescriptors are included, such as texture, color layout, and/or otherdescriptors to the CMVR screening setup to help distinguish the lesionsbetter.

3. Other Diseases

a. Alzheimer's

Patients with early forms of Alzheimer's disease (AD) display narrowerretinal veins compared to their peers without AD as discussed in FatmireBerisha et al., “Retinal Abnormalities in Early Alzheimer's Disease,”Investigative Ophthalmology & Visual Science 48, No. 5 (May 1, 2007):2285-2289. Hence, AD can be screened by customized vasculatoic analysis.

b. Stroke

The retinal arterioles may narrow as a result of chronic hypertensionand this may predict stroke and other cardiovascular diseasesindependent of blood pressure level as discussed in Tien Yin Wong,Ronald Klein, A. Richey Sharrett, David J. Couper, Barbara E. K. Klein,Duan-Ping Liao, Larry D. Hubbard, Thomas H. Mosley, “Cerebral whitematter lesion, retinopathy and risk of clinical stroke: TheAtherosclerosis Risk in the Communities Study”. JAMA 2002; 288:67-74.Thus, the system may also be used to screen for strokes.

c. Cardiovascular Diseases

The retinal arterioles may narrow as a result of chronic hypertensionand this may predict stroke and other cardiovascular diseasesindependent of blood pressure level, as discussed in Tien Y. Wong, WayneRosamond, Patricia P. Chang, David J. Couper, A. Richey Sharrett, LarryD. Hubbard, Aaron R. Folsom, Ronald Klein, “Retinopathy and risk ofcongestive heart failure”. JAMA 2005; 293:63-69. Thus, the system may beused to screen for cardiovascular diseases.

d. Retinopathy of Prematurity

Neovascularization, vessel tortuosity and increased vessel thicknessindicate retinopathy of prematurity, as discussed in Flynn Jt et al.,“Retinopathy of Prematurity. Diagnosis, Severity, and Natural History.”Ophthalmology 94, No. 6 (June 1987): 620-629. Thus, retinopathy ofprematurity can be analyzed by automated retinal image analysis toolsfor screening.

e. Macular Degeneration

Lesions may also indicate macular degeneration as discussed in A. C.Bird et al., “An International Classification and Grading System forAge-Related Maculopathy and Age-Related Macular Degeneration,” Survey ofOphthalmology 39, No. 5 (March 1995): 367-374. Thus, lesions such asdrusen bodies can be detected and localized using the lesionlocalization system described in the section above entitled “LesionLocalization” and this disease can be screened for using a similar setupas described in the section “Diabetic retinopathy screening”.

VII. Architectures

It is recognized that the systems and methods may be implemented in avariety of architectures including telemedicine screening, cloudprocessing, or using other modalities.

A. Telemedicine Screening

1. General Description

In one embodiment, the system includes a flexible applicationprogramming interface (API) for integration with existing or newtelemedicine, systems, programs, or software. The Picture Archival andCommunication System (PACS) is used as an example telemedicine serviceto enable such an integration. Block diagram of one embodiment of such asystem is shown in FIGS. 43A and 43B. The system includes an APIallowing coding of one or more of patients' metadata 1306,de-identifying images 1307 to anonymize patients for analysis andprotect privacy, analyzing image quality 1312, initiating reimaging asneeded 1316, updating patient metadata, storing images and analysisresults in database 1314, inputting 1310, and/or outputting 1308transmission interfaces. The Image Analysis System (IAS) comprises oneor more of the following: input 1318 and output 1328 transmissioninterfaces for communication with the PACS system, a database updater1320, a quality assessment block 1322 to assess image gradability 1324,an analysis engine 1326 that can include a combination of one or more ofthe following tools: disease screening, lesion dynamics analysis, orvessel dynamics analysis. In one embodiment, the PACS and/or the IASsystem could be hosted on remote or local server or other computingsystem, and in another embodiment, they could be hosted or cloudinfrastructure.

In one embodiment, the API is designed to enable seamlessinter-operation of the IAS with a telemedicine service, such as PACS,though any telemedicine system, software, program, or service could beused. An interface for one embodiment is presented in FIG. 43A.

In one embodiment, the API includes one or more of the followingfeatures:

-   -   Image data sent to IAS server: Once a patient is imaged,        relevant metadata, like retinal image field, is added, a unique        control number or identifier (id) is generated for the case, and        the patient image is de-identified by PACS. The id along with        the de-identified images and metadata is then sent to IAS, for        example block 1300 and URL 1        (https://api.eyenuk.com/eyeart/upload), using a secure protocol        such as the secure hypertext transfer protocol (HTTPS) POST        request with multi-part/form content type, which also includes        authentication from PACS and/or the user.    -   Ack sent back to PACS: Once the POST request is received by the        IAS server, the input data is validated, and the application and        user sending the data are authenticated. After authenticating        the request, an acknowledgment is sent back.    -   Image Analysis on IAS Analysis Engine: IAS image analysis        engine, for example block 1302, updates the database with the        patient images, associated data and unique id, and proceeds to        analyze the images. The images are assessed for their        gradability in multiple threads. If the images are of gradable        quality, the screening results are estimated.

2. Transfer Of Analysis Results

In one embodiment, IAS initiates the transfer of results to PACS. Inthis mode of operation, PACS would not have a control over when it wouldreceive the results. The transfer may include one or more of thefollowing:

-   -   Image analysis results sent to PACS: For images of gradable        quality, the corresponding screening results are embedded as        JSON (JavaScript Object Notation) data and sent in a new HTTPS        POST request to the PACS server using protocols discussed in        https://upload.eyepacs.com/eyeart_analysis/upload. Ungradable        images are indicated as such.    -   Ack sent back to IAS server: After receiving the results, PACS        server validates the received data and sends an acknowledgment        back, block 1304.

In another embodiment, PACS initiates the transfer of results to itssystem. In this mode of operation, PACS can choose when to retrieve theanalysis results from IAS server. This circumvents the possibility ofdata leaks, since the screening results are sent from IAS upon request.The transfer may include one or more of the following:

-   -   PACS queries for result status: Similar to the initial POST        request, the PACS server uses HTTPS POST request with        multi-part/form content type, to transmit the image ids for        which it wants to know the status of image analysis using, for        example, protocols disclosed in        https://api.eyenuk.com/eyeart/status.    -   Ack sent back to PACS: Once the POST request is received by the        IAS server, the input id is validated, and the application and        user sending the data are authenticated. An acknowledgment is        then sent back along with the status of the result, (for        example, “In Queue” or “Processing” or “Done”) for the requested        id or ids.    -   PACS queries for results: The PACS server sends an AJAX request        (for example, jQuery $.get) to asynchronously, in the        background, retrieve the results from the IAS server using, for        example, protocols disclosed in        https://api.eyenuk.com/eyeart/result. The appropriate AJAX        callbacks are set for handling events such as processing of        results once it is received, handling failure of the request, or        the like.    -   Posting results to PACS: Once the processing is done, for images        of gradable quality, the corresponding screening results are        embedded as JSON data and sent as a response to the        authenticated PACS server AJAX request. If images are ungradable        they are indicated as such in the response. This response        triggers the corresponding callback (set during the request) at        the PACS server, which could process the results and add them to        the patient database, for example block 1304.

Table 7 presents one embodiment of technical details of an interfacewith telemedicine and error codes for a Telemedicine API. The designincludes considerations directed to security, privacy, data handling,error conditions, and/or independent server operation. In oneembodiment, the PACS API key to obtain “write” permission to IAS serverwould be decided during initial integration, along with the IAS API keyto obtain “write” permission to PACS. The API URL, such ashttps://upload.eyepacs.com/eyeart_analysis/upload, for IAS to transferresults to PACS could either be set during initial registration orcommunicated each time during the POST request tohttps://api.eyenuk.com/eyeart/upload.

TABLE 7 Error Code Description 1 No images specified 2 No qualitystructure specified 3 General upload failure 4 Unique ID not specified 5Invalid signature 6 Invalid API key 7 Insufficient permissions

Table 8 shows one embodiment of details of an IAS and PACS API. Oneembodiment of error codes is described in Table 7. The URLs uses in thetable are for illustrative purposes only.

TABLE 8 Authentication Arguments Success response Error Codes URL 1:https://api.eyenuk.com/eyeart/upload API key, multi-part/form contenttype with HTTP 200 1, 3, 4, 6, 7 User ID images, unique id foridentifying images of a particular patient, dictionary containing theretinal image fields for each image. URL 2:https://upload.eyepacs.com/eyeart_analysis/upload API Key JSON objectwith unique id, HTTP 200 2, 3, 4, 6, 7 Structure with DR screeninganalysis details, Structure with quality analysis details. URL 3:https://api.eyenuk.com/eyeart/status API key, multi-part/form contenttype HTTP 200 3, 4, 6, 7 User ID with unique ids for images. URL 4:https://api.eyenuk.com/eyeart/result API key, AJAX request (possiblyjQuery HTTP 200 3, 4, 6, 7 User ID $.get) with callbacks for success andfailure.

B. Processing on the Cloud

Image processing and analysis can be performed on the cloud, includingby using systems or computing devices in the cloud. Large-scale retinalimage processing and analysis may not be feasible on normal desktopcomputers or mobile devices. Producing results in near constant timeirrespective of the size of the input dataset is possible if the retinalimage analysis solutions are to be scaled. This section describes theretinal image acquisition and analysis systems and methods according tosome embodiments, as well as the cloud infrastructure used to implementthose systems and methods.

1. Acquisition and Analysis Workflow

FIG. 44 shows an embodiment of a retinal image acquisition and analysissystem. Diabetic retinopathy patients, and patients with other visiondisorders, visit diagnostic clinics for imaging of their retina. Duringa visit, termed an encounter, multiple images of the fundus arecollected from various fields and from both the eyes for each patient.In addition to the color fundus images, photographs of the lens are alsoadded to the patient encounter images. These images are acquired byclinical technicians or trained operators, for example, on color funduscameras or portable cellphone-based cameras.

In an embodiment of cloud-based operation, the patient 449000 imagerefers to the retinal data, single or multidimensional, captured fromthe patient using a retinal imaging device, such as cameras for colorimage capture, fluorescein angiography (FA), adaptive optics, opticalcoherence tomography (OCT), hyperspectral imaging, scanning laserophthalmoscope (SLO), wide-field imaging or ultra-wide-field imaging.The acquired images are stored on the local computer or computing device449004, or mobile device 449008 and then transmitted to a central datacenter 449104. Operators at the data center can then use traditionalserver-based or computing device-based 449500, desktop-based 449004, ormobile-based 449008 clients to push these images to the cloud 449014 forfurther analysis and processing. The cloud infrastructure generatespatient-level diagnostic reports which can trickle back to the patients,for example, through the same pipeline, in reverse.

In another embodiment of cloud-based operation, the imaging setup cancommunicate with the cloud, as indicated by dotted lines in FIG. 44. Theimages can be pushed to the cloud following acquisition. The diagnosticresults are then obtained from the cloud, typically within minutes,enabling the clinicians or ophthalmologists to discuss the results withthe patients during their imaging visit. It also enables seamlessre-imaging in cases where conclusive results could not be obtained usingthe initial images.

In another embodiment of cloud-based operation, data centers storeimages from thousands of patients 449500. The data, for example, mayhave been collected as part of a clinical study for either diseaseresearch or discovery of drugs or treatments. The patient images mayhave been acquired, in preparation for the study, and then pushed to thecloud for batch-processing. The images could also be part of routineclinical workflow where the analysis is carried out in batch mode forseveral patients. The cloud infrastructure can be scaled to accommodatethe large number of patient encounters and perform retinal analysis onthe encounters. The results can be presented to the researchers in acollated fashion enabling effective statistical analysis for the study.

2. Image Analysis on the Cloud

FIG. 45 shows one embodiment of the cloud infrastructure 19014 used forretinal image processing and analysis. The client can be server-based orcomputing device-based 459500, desktop-based 459004, or mobile-based459008. In one embodiment the client may be operated by a human operator459016. The workflow can include one or more of the following:

-   -   First, the client logs-in to the web-server 459400 and requests        credentials for using the cloud infrastructure. Following this        authorization action, the client can access the various        components of the cloud infrastructure.    -   During authorization, the client can send the number of        encounters or images it plans to process in a run. Based on this        number, the web-server initializes the components of the cloud,        for example,        -   Input 459404 and output 459408 message queues: These queues            are fast, reliable and scalable message queuing services            which act as an interface between client and the cloud.            Messages in input queue indicate which encounters are ready            for analysis, while those in output queue indicate which            encounters have been analyzed on the cloud.        -   Cloud storage 459406: Can comprise a distributed network of            hard disks (magnetic or solid-state), concurrently            accessible via high-bandwidth connections to the worker            machines. They can provide high security features, such as            data encryption and firewalls, to guard against unauthorized            access. They can also provide reliability, by, for example,            redundant data storage across the network, against hardware            and data failures allowing for disaster recovery.        -   Auto scaling group 459412: Can comprise of a group of worker            machines or computing devices which can process the images            in an encounter. For example, each worker machine 459414 can            comprise of 32 or more, multi-core, 64-bit processors with            high computing power and access to high-speed random access            memory (RAM). The number of worker machines in the group is            automatically scaled, that is, new machines are created, or            old ones terminated, depending on the cloud metrics.        -   Worker machine image 459416: Software that powers each            worker machine. New machines created 459418 can be loaded            with a machine image to transform them into worker machines            459414        -   Cloud metrics 459410: Component that monitors the number of            encounters being processed by the existing machines, the            number of encounters waiting to be processed in the input            queue, and the current load on the worker machines. Auto            scaling group uses this information to scale the number of            worker machines.    -   After authorization, the client can perform some preliminary        processing of the retinal images, which may include resizing or        image enhancement.    -   The pre-processed images from an encounter are then uploaded to        cloud storage, a corresponding encounter entry, which may        contain image metadata, is made in the database 459402, and a        message object is pushed to the input message queue to let the        worker machines know that the encounter is ready for processing.        In batch-processing mode, the images are pushed to the cloud in        multiple software threads for faster uploads. After pushing the        messages to the input queue, the client polls the output message        queue for encounters that have been processed.    -   Once started, the worker machines poll the input message queue        in anticipation of encounters to process. Once a message appears        in the queue, they delete the message, access the database entry        corresponding to that encounter, and download the images for        that encounter to local memory. They then start processing and        analyzing the images for retinal diseases. Each worker machine        can process multiple images or encounters simultaneously        depending on the number of processor cores it has. During        processing, the worker machines can save intermediate data to        the cloud storage. Depending on the load each machine is        handling and the number of messages, or encounters, waiting to        be processed in the input message queue, the auto scaling        component 459412 can automatically start new worker machines,        load the required machine image, and initialize them to start        pulling messages from the input queue and to start processing        the encounters. The auto scaling component can also terminate        machines if it thinks that computing power is left idle, in view        of the volume of the new messages in the input queue.    -   After processing the images from an encounter, the worker        process writes necessary data or images back to cloud storage,        updates the corresponding encounter entry in the database with        diagnostic results, and pushes a message to the output queue to        let the client know that an encounter has been processed. If an        error occurs during processing of an encounter, the encounter        updates the database encounter entry indicating the error, and        re-pushes the message back to the input queue, for another        worker process to process the encounter. However, if the message        has been re-pushed more than a couple of times, indicating that        the encounter data itself has some problem, the worker process        can delete the message from the input queue and push it to the        output queue after updating the corresponding database entry.    -   Once a message appears in the output queue, the client deletes        it from the queue and accesses the corresponding entry in the        database to know the analysis results, or errors, if any, for an        encounter. The results are then formatted and presented to the        client. In batch-processing mode, the results for the encounters        in the run can be collated into a spreadsheet for subsequent        analysis by the client.

3. Use of Amazon Web Services

In one embodiment, the cloud operation described above has beenimplemented using Amazon Web Services™ infrastructure, and the cloudstorage is implemented using Simple Storage Service (S3). The input andoutput message queues may be implemented with Simple Queue Service(SQS). The web-server is hosted on a t1-micro Elastic Cloud Compute(EC2) instance. The database is implemented with the Relational DatabaseService (RDS) running a MySQL database instance. Each worker machine isa c3.8xlarge EC2 instance with 32-processors and 60 GB of RAM. The cloudmetrics are obtained using Cloud Watch. The scaling of EC2 capacity(automatic creation and termination of worker machines) is done usingAmazon Auto Scaling. The software that runs on each of the workermachines is stored as an Amazon Machine Image (AMI).

C. New and Other Image Modalities

1. Widefield and Ultra-Widefield Images

Widefield and ultra-widefield retinal images capture fields of view ofthe retina in a single image that are larger than 45-50 degreestypically captured in retinal fundus images. These images are obtainedeither by using special camera hardware or by creating a montage usingretinal images of different fields. The systems and methods describedherein can apply to widefield and ultra-widefield images.

2. Fluorescein Angiography Images

Fluorescein angiography involves injection of a fluorescent tracer dyefollowed by an angiogram that measures the fluorescence emitted byilluminating the retina with light of wavelength 490 nanometers. Sincethe dye is present in the blood, fluorescein angiography imageshighlight the vascular structures and lesions in the retina. The systemsand methods described herein can apply to fluorescein angiographyimages.

3. Scanning Laser and Adaptive Optics Images

Scanning laser retinal imaging uses horizontal and vertical mirrors toscan a region of the retina that is illuminated by laser while adaptiveoptics scanning laser imaging uses adaptive optics to mitigate opticalaberrations in scanning laser images. The systems and methods describedherein can apply to scanning laser and adaptive optics images.

VIII. Computing System

In some embodiments, the process of imaging is performed by a computingsystem 8000 such as that disclosed in FIG. 46.

In some embodiments, the computing system 5000 includes one or morecomputing devices, for example, a personal computer that is IBM,Macintosh, Microsoft Windows or Linux/Unix compatible or a server orworkstation. In one embodiment, the computing device comprises a server,a laptop computer, a smart phone, a personal digital assistant, a kiosk,or a media player, for example. In one embodiment, the computing deviceincludes one or more CPUS 5005, which may each include a conventional orproprietary microprocessor. The computing device further includes one ormore memory 5030, such as random access memory (“RAM”) for temporarystorage of information, one or more read only memory (“ROM”) forpermanent storage of information, and one or more mass storage device5020, such as a hard drive, diskette, solid state drive, or opticalmedia storage device. Typically, the modules of the computing device areconnected to the computer using a standard based bus system. Indifferent embodiments, the standard based bus system could beimplemented in Peripheral Component Interconnect (PCI), Microchannel,Small Computer System Interface (SCSI), Industrial Standard Architecture(ISA) and Extended ISA (EISA) architectures, for example. In addition,the functionality provided for in the components and modules ofcomputing device may be combined into fewer components and modules orfurther separated into additional components and modules.

The computing device is generally controlled and coordinated byoperating system software, such as Windows XP, Windows Vista, Windows 7,Windows 8, Windows Server, Embedded Windows, Unix, Linux, Ubuntu Linux,SunOS, Solaris, iOS, Blackberry OS, Android, or other compatibleoperating systems. In Macintosh systems, the operating system may be anyavailable operating system, such as MAC OS X. In other embodiments, thecomputing device may be controlled by a proprietary operating system.Conventional operating systems control and schedule computer processesfor execution, perform memory management, provide file system,networking, I/O services, and provide a user interface, such as agraphical user interface (GUI), among other things.

The exemplary computing device may include one or more commonlyavailable I/O interfaces and devices 5010, such as a keyboard, mouse,touchpad, touchscreen, and printer. In one embodiment, the I/Ointerfaces and devices 5010 include one or more display devices, such asa monitor or a touchscreen monitor, that allows the visual presentationof data to a user. More particularly, a display device provides for thepresentation of GUIs, application software data, and multimediapresentations, for example. The computing device may also include one ormore multimedia devices 5040, such as cameras, speakers, video cards,graphics accelerators, and microphones, for example.

In the embodiment of the imaging system tool of FIG. 46, the I/Ointerfaces and devices 5010 provide a communication interface to variousexternal devices. In the embodiment of FIG. 46, the computing device iselectronically coupled to a network 5060, which comprises one or more ofa LAN, WAN, and/or the Internet, for example, via a wired, wireless, orcombination of wired and wireless, communication link 5015. The network5060 communicates with various computing devices and/or other electronicdevices via wired or wireless communication links.

According to FIG. 46, in some embodiments, images to be processedaccording to methods and systems described herein, may be provided tothe computing system 5000 over the network 5060 from one or more datasources 5076. The data sources 5076 may include one or more internaland/or external databases, data sources, and physical data stores. Thedata sources 5076 may include databases storing data to be processedwith the imaging system 5050 according to the systems and methodsdescribed above, or the data sources 5076 may include databases forstoring data that has been processed with the imaging system 5050according to the systems and methods described above. In someembodiments, one or more of the databases or data sources may beimplemented using a relational database, such as Sybase, Oracle,CodeBase, MySQL, SQLite, and Microsoft® SQL Server, as well as othertypes of databases such as, for example, a flat file database, anentity-relationship database, and object-oriented database, NoSQLdatabase, and/or a record-based database.

In the embodiment of FIG. 46, the computing system 5000 includes animaging system module 5050 that may be stored in the mass storage device5020 as executable software codes that are executed by the CPU 5005.These modules may include, by way of example, components, such assoftware components, object-oriented software components, classcomponents and task components, processes, functions, attributes,procedures, subroutines, segments of program code, drivers, firmware,microcode, circuitry, data, databases, data structures, tables, arrays,and variables. In the embodiment shown in FIG. 46, the computing system5000 is configured to execute the imaging system module 5050 in order toperform, for example, automated low-level image processing, automatedimage registration, automated image assessment, automated screening,and/or to implement new architectures described above.

In general, the word “module,” as used herein, refers to logic embodiedin hardware or firmware, or to a collection of software instructions,possibly having entry and exit points, written in a programminglanguage, such as, for example, Python, Java, Lua, C and/or C++. Asoftware module may be compiled and linked into an executable program,installed in a dynamic link library, or may be written in an interpretedprogramming language such as, for example, BASIC, Perl, or Python. Itwill be appreciated that software modules may be callable from othermodules or from themselves, and/or may be invoked in response todetected events or interrupts. Software modules configured for executionon computing devices may be provided on a computer readable medium, suchas a compact disc, digital video disc, flash drive, or any othertangible medium. Such software code may be stored, partially or fully,on a memory device of the executing computing device, such as thecomputing system 5000, for execution by the computing device. Softwareinstructions may be embedded in firmware, such as an EPROM. It will befurther appreciated that hardware modules may be comprised of connectedlogic units, such as gates and flip-flops, and/or may be comprised ofprogrammable units, such as programmable gate arrays or processors. Theblock diagrams disclosed herein may be implemented as modules. Themodules described herein are preferably implemented as software modulesbut may be represented in hardware or firmware. Generally, the modulesdescribed herein refer to logical modules that may be combined withother modules or divided into sub-modules despite their physicalorganization or storage.

IX. Additional Embodiments

Each of the processes, methods, and algorithms described in thepreceding sections may be embodied in, and fully or partially automatedby, code modules executed by one or more computer systems or computerprocessors comprising computer hardware. The code modules may be storedon any type of non-transitory computer-readable medium or computerstorage device, such as hard drives, solid state memory, optical disc,and/or the like. The systems and modules may also be transmitted asgenerated data signals (for example, as part of a carrier wave or otheranalog or digital propagated signal) on a variety of computer-readabletransmission mediums, including wireless-based and wired/cable-basedmediums, and may take a variety of forms (for example, as part of asingle or multiplexed analog signal, or as multiple discrete digitalpackets or frames). The processes and algorithms may be implementedpartially or wholly in application-specific circuitry. The results ofthe disclosed processes and process steps may be stored, persistently orotherwise, in any type of non-transitory computer storage such as, forexample, volatile or non-volatile storage.

The various features and processes described above may be usedindependently of one another or may be combined in various ways. Allpossible combinations and subcombinations are intended to fall withinthe scope of this disclosure. In addition, certain method or processblocks may be omitted in some implementations. The methods and processesdescribed herein are also not limited to any particular sequence, andthe blocks or states relating thereto can be performed in othersequences that are appropriate. For example, described blocks or statesmay be performed in an order other than that specifically disclosed, ormultiple blocks or states may be combined in a single block or state.The example blocks or states may be performed in serial, in parallel, orin some other manner. Blocks or states may be added to or removed fromthe disclosed example embodiments. The example systems and componentsdescribed herein may be configured differently than described. Forexample, elements may be added to, removed from, or rearranged comparedto the disclosed example embodiments.

Conditional language, such as, among others, “can,” “could,” “might,” or“may,” unless specifically stated otherwise, or otherwise understoodwithin the context as used, is generally intended to convey that certainembodiments include, while other embodiments do not include, certainfeatures, elements and/or steps. Thus, such conditional language is notgenerally intended to imply that features, elements and/or steps are inany way required for one or more embodiments or that one or moreembodiments necessarily include logic for deciding, with or without userinput or prompting, whether these features, elements and/or steps areincluded or are to be performed in any particular embodiment. The term“including” means “included but not limited to.” The term “or” means“and/or”.

Any process descriptions, elements, or blocks in the flow or blockdiagrams described herein and/or depicted in the attached figures shouldbe understood as potentially representing modules, segments, or portionsof code which include one or more executable instructions forimplementing specific logical functions or steps in the process.Alternate implementations are included within the scope of theembodiments described herein in which elements or functions may bedeleted, executed out of order from that shown or discussed, includingsubstantially concurrently or in reverse order, depending on thefunctionality involved, as would be understood by those skilled in theart.

All of the methods and processes described above may be embodied in, andpartially or fully automated via, software code modules executed by oneor more general purpose computers. For example, the methods describedherein may be performed by the computing

What is claimed is:
 1. A computing system for enhancing a retinal image,the computing system comprising: one or more hardware computerprocessors; and one or more storage devices configured to store softwareinstructions configured for execution by the one or more hardwarecomputer processors in order to cause the computing system to: access amedical retinal image I for enhancement, the medical retinal imagerelated to a subject; estimate the background

of the image at single or multiple scales; and scale the intensityI(x,y) at a first pixel location in the medical retinal image adaptivelybased on the intensity at a same position in the background image

(x,y) for generating an enhanced image.